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Abstract 

Blackwell approachability, regret minimization and calibration are three criteria 
evaluating a strategy (or an algorithm) in different sequential decision problems, or 
repeated games between a player and Nature. Although they have at first sight noth- 
ing in common, links between have been discovered: both consistent and calibrated 
strategies can be constructed by following, in some auxiliary game, an approacha- 
bility strategy. 

We gathered famous or recent results and provide new ones in order to develop 
and generalize Blackwell's elegant theory. The final goal is to show how it can 
be used as a basic powerful tool to exhibit a new class of intuitive algorithms, 
based on simple geometric properties. In order to be complete, we also prove that 
approachability can be seen as a byproduct of the very existence of consistent or 
calibrated strategies. 

Introduction 

Sequential decision problems can be represented as repeated games between a player 
and Nature. At each stage the player (also called agent, decision maker or predictor 
depending on the context) chooses an element of his decision set. At the same time, 
Nature chooses on her side a state of the world. Those sequences of choices generate a 
sequence of outcomes that induces an overall payoff to the player. 

The opponent is called Nature as we do not precise her payoff, her objectives or her 
rationality; absolutely no assumptions is made on her behavior, and future states of the 
world cannot be inferred from the past. Typically the environment is not stochastic or 
Bayesian but adversarial, for instance, Nature can represent one malignant opponent, 
or a set of independent (or correlated) players. A crucial requirement of these model is 
that a strategy of the player must be good (i.e., it must fulfill some exogenous criterion) 
against every possible sequence of states of the world (or simply against any strategy of 
Nature) . 

Depending on the structure of outcomes mappings, overall objectives of the player 
might vary. Hannan |30| studied the case where an outcome is actually a real payoff. 
The player's goal is to maximize his average (or cumulative) payoff. As we made no 
assumption on Nature's behavior, a player can not ensure to himself a given exogenous 
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amount, unlike in traditional zero sum game where a value can be guaranteed: assume 
for instance that Nature decides to give a payoff of zero (or one, minus one, etc) to the 
player at each stage, no matter what he does. 

The criterion Hannan introduced is called regret and measures the difference between 
the average payoff the player got and what he would have got if he had chosen the same 
action repeatedly. It is somehow related to convex optimization (if Nature chooses 
repeatedly the same loss function), or more precisely to online convex optimization. 

Main results of Hannan |30| are that such a consistent strategy, i.e., a strategy without 
regret exists, and he constructed one. This has been widely refined and improved using 
different techniques and ideas by notably (providing an exhaustive list seems almost 
impossible as the subject has been developed by many different communities) Foster 
k Vohra [23], Hart & Mas-Colell [31], Fudenberg k Levine [28], Lehrer [l3], Auer, 
Cesa-Bianchi k Gentile [3], Cesa-Bianchi k Lugosi |14[ (see also references therein), 
Sorin |71|... 

When outcomes are vectorial (and not scalar) payoffs, the problem is closely related 
to multicriteria optimization, each coordinate representing a different sub-objective. In- 
stead of considering some exogenous convex combination of these objectives or optimizing 
them in a given order (to encompass this framework into the precedent one) , Blackwell [9] 
introduced another concept. He considered that some target set is given and the player's 
goal is that the average outcome converges to it; on the contrary, Nature tries to push 
it away. Formally, a given closed set is approachable, if the player has a strategy such 
that the average payoffs remains, after some maybe large stage, arbitrarily closed to this 
target set, no matter the sequence of moves of Nature. 

Blackwell's approachability theory is quite elegant as it relies on simple geometric 
properties. They allowed him to characterize explicitly approachable convex sets and to 
provide a simple sufficient approachability condition for non-convex set (such sets are 
called, in reference to Blackwell, B-sets). Spinat |73| proved later that this was in fact 
almost a necessary condition. 

Maybe the first and most important use of this whole theory is due to Kohlberg |39) . 
He constructed, using this simple tool, an optimal strategy for the uninformed player in 
zero-sum games with incomplete information, introduced by Aumann and Maschler [5] 
(see for instance Mertens, Sorin k Zamir |55) and references given for more details on 
this subject). Approachability gained also a recent interest, both from the game theory 
and machine learning community, with works of - again non-exhaustively - Vieille |77) , 
Hart k Mas-Colell |32| . Spinat |73) . Lehrer [32], Benai'm, Hofbauer k Sorin [7], Mannor 
k Shimkin |49| . Lehrer k Solan |44| 146) . As Soulaimani, Quincampoix k Sorin [2], 
Mannor k Tsitsiklis |53| . Perchet |60| I61|, Rakhlin, Sridharan k Tewari [66], Perchet k 
Quincampoix [63]... 

Another (and the last to be considered here) criterion is calibration, written within 
this framework by Dawid |16| and extended thereafter by, in many others, Foster k 
Vohra |24| . Fudenberg k Levine |27| . Lehrer |41j . Sandroni, Smorodinsky k Vohra [68] . 
Sorin [7T], Perchet [60] . Foster, Rakhlin, Sridharan k Tewari |22| . and so on. 
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Here, a stage outcome is not some payoff (either scalar or vectorial) but the actual 
state of the world chosen by Nature. The overall objective of the player is to predict, 
sequentially, the whole sequence of states so that the average prediction and the empirical 
distribution of states are asymptotically arbitrarily close. Without any other restrictions, 
this is in fact fairly easy: one just has to predict at some stage the outcome of the 
precedent one. 

Additional requirements can be, for instance, that predictions can only belong to 
some finite (yet maybe large) set and that the empirical distribution of states on the 
set of stages where a specific prediction is made is closer to this prediction than to any 
other possible one. An usual and celebrated example consists in a meteorologist that 
predicts, each day, the probability of rain the following day. Predictions belongs to 0%, 
10%, 20%, etc. and it is asked that that when a meteorologist says that the probability 
of rain is, say, 30%, it rains in average between 35% and 45% of the times. 

Oakes |57| and Dawid |17) proved that no deterministic algorithm can be calibrated 
(yet this strong assessment could be discussed) while random algorithm can, as proved 
by Foster & Vohra |24j . The existence of such algorithms can be seen as a negative result, 
as it claims that a strategic non-informed meteorologist can mimic an expert one (that 
knows the true underlying process, if it exists); a whole literature studied this aspect 
and recent results are gathered into the survey of Olszewski |58) . On the other hand, it 
can also be seen as a positive result, as it states that the long term behavior of Nature 
can asymptotically be predicted, and this might lead to another class of algorithms and 
results, as in Foster &; Vohra |23| or Perchet |59| 162) . 

A common feature of regret minimization and calibration is that they can be written 
as a specific case of approachability of a well chosen target set in some auxiliary vectorial 
payoff game . The first to notice this property is Blackwell |10| (this idea is already 
mentioned at the end of the seminal paper of Hannan |30) or in Luce & Raiffa |48| ) and 
then by Foster [2l], Hart & Mas-CoMl [32], Lehrer k Solan [45], Sorin [71], Perchet [60] . 
Mannor & Stoltz |50| . Abernathy, Barltlett &; Hazan [1]... 

We assumed implicitly that the player observes the sequence of states of the world; 
this is in fact a crucial hypothesis here, sometimes referred to as full monitoring. In 
particular, we will not consider the case of partial monitoring (or bandit problems), or 
stochastic games (where, for instance, the whole sequence of outcomes could depend of 
a unique choice at some stage). Those are also interesting subjects, yet far from the 
current scope. 

Objectives and Structure of the paper. 

Describing explicit interactions and equivalences between the notions of approachability, 
calibration and regret is the central point of this paper, the final argument being that 
that explicit constructions of consistent and calibrated strategies (even for more precise 
or refined notions that the ones introduced here) are possible and provided thanks to 
approachability theory. The remaining is organized as follows: 

In Section [H we introduce the concept of approachability, centerpiece of this work. 
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We first recall (in Subsection II. ip a sufficient and necessary condition under which 
an arbitrary set is approachable. The specific case of convex sets, for which a complete 
characterization is available, is studied in Subsection 11.21 First extensions and gener- 
alizations of the framework (e.g., in infinite dimension, with variable stage durations, 
unbounded payoffs, etc.) are given in Subsection 11.31 Last Subsection 11.41 is concerned 
with other possible proofs and techniques of approachability. In particular, we show that 
approachability with respect to the supremum norm can be achieved using some poten- 
tial minimization, generalizing the exponential weight algorithm; we also prove that the 
usual Euclidian (or Hilbertian) framework is not necessary for approachability. 

Proofs are almost always provided, as long as they bring something new to the 
literature (yet some technical lemmas are delayed to the Appendix). 

Regret minimization is introduced in Section [2j Several refinements are introduced 
and links with game theory (as well as famous algorithms called exponential weight 
algorithm and follow the perturbed leader) are given in Subsection 12.31 Since our pur- 
pose is to provide reduction to some auxiliary approachability problems, proofs are only 
sketched in this section and delayed to the last one. An example of regret minimization, 
with expert advice is given for illustration at the end; however, this subject is very well 
studied in the book of Cesa-Bianchi & Lugosi |14) . 

Calibration and its generalizations are formalized in Section [3l for the same reasons, 
proofs are essentially delayed to the last section. We provide there a discussion on wether 
calibration (yet a weaker but maybe more intuitive notion) can or can not be obtained 
using deterministic algorithms. 

Final Section U] contains all the reductions to approachability. We prove (or recall) 
how regret minimizations (either with finite or infinite action spaces) and calibration 
(either finite or with checking rules) can be obtained using approachability results from 
the first section. 

Maybe the most general results are, on regret minimization. Theorems 14.11 and 14.21 
that provide (explicit for the first one) strategy minimizing swap regret if action space 
are, respectively, finite or infinite. Proposition 14.21 due to Blackwell [10| himself, shows 
how minimization of the supremum norm of regret is exactly approachability. 

Concerning calibration, most striking results might be Proposition 14.51 its conse- 
quence Theorem 14.41 and Theorem 14.51 They refine and generalize recent results of 
Mannor & Stoltz |50| as well as Rakhlin, Sridharan and Tewari |66) . 

We conclude this Section by explaining how the circle is complete: if regret minimiza- 
tion and calibration can be seen as specific instances of approachability, the converse is 
also true. Indeed, using some generalized notions of regret and/or calibration, one can 
construct approachability strategies (in the case of convex sets). 
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1 Blackwell's approachability 



1.1 Approachability of arbitrary sets 

Consider a two-person repeated game between a player and Nature. Their actions set 
are respectively denoted by A and B (of respective cardinality A and B) and payoffs 
are defined trough some vectorial mapping g : AxB ^R'^. The game is repeated in 
discrete time, and we denote actions chosen at stage n G IN by S ^ and 6„ G B; they 
induce a payoff gn := g{an,bn) G R!^- Formally, o„ and are functions of the history, 
i.e., the past observations /i""^ = {ai,bi, . . . ,an-i,bn-i) £ {A x B)^^^ =: Hn-i- 

Explicitly, a strategy a of the player is a mapping from H := IJ^^^Hn, the set 
of finite histories, into A(^), the set of probability distributions over A. Similarly, a 
strategy r of Nature is a mapping from H into A{B). Kolmogorov's extension theorem 
implies that a pair (cr, r) induces a probability distribution Pa,T over Ti = {Ax B)^ , the 
set of infinite histories of the game embedded with the product topology. 

Before defining the concept of approachability, we introduce some notations. Given 
a closed set £ C M!^, we denote by dg{x) = uiiz<^£{\\x — z\\} the distance from x to 
f , by £^ = {z e s.t. de{x) < 6} the (5-open neighborhood of £, and by Hg^x) = 
{z £ £ s.t. \\x — z\\ = d£{x)} the projection of x onto £, which is in general non single- 
valued. We also denote by co (^£^ the convex hull of a set. The mapping g defined 
on A X B (and more generally any such mapping) is extended to A(^) x A{B) by 
g{x, y) = ^xi^iy 9{^i b) ■ The average of a sequence s = {smjmew up to stage n G IN is 
denoted by s„ := Sm/n. 

Definition 1.1 A closed set £ C is approachable by the player if he has a strategy a 
ensuring, for every e > 0, the existence of some integer A'g G IN such that, no matter the 
strategy t of Nature, 

sup E^.^((i£:(g„) ) < e and P^,^ sup d^ig^) > e < e. (1) 

n>N^ ^ ' \n>N. J 

A set £ is excludable by Nature if she can approach the complement of £^ for some 5 > ^. 

Informally, a given set £ C is approachable by the player if he has a strategy such that 
the average payoff converges almost-surely to £, uniformly with respect of the strategies 
of Nature. The right hand side of Equation ([1]) clearly implies the first one, which is 
actually the most commonly used (and rates of convergences, i.e. smallest mappings 
e I— 7- satisfying each condition, might differ). 



1.1.1 Approachable arbitrary set : Blackv^^ell's sufficient condition 

Blackwell |9j provided a simple geometrical condition under which a set £ is approachable. 
This sufficient condition is in fact almost necessary (as proved in Section ri.l.2| following 
Spinat |73|). 
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Definition 1.2 A closed set 8 C R'^ is a B-set if for every z G , there exists a 
projection ir G n^(-z) and x := x{z) G A(^) such that the hyperplane perpendicular to 
z — n at z separates z from , y G A(;S)|, or formally: 

Vz G R'^,3 7r G n£:(z),3a; G A(^),V?/ G A(B) : {g{x,y) - n, z - tt) < . (2) 

Blackwell [9] proved that being a B-set is sufficient for approachability ; he also exhibited 
a specific strategy, from now on referred to as BlackweU (approachabihty) strategy. 

Theorem 1.1 If £ is a B-set, then E is approachable by the player. Moreover, the 
strategy a defined by a{h^) = ensures that, for every 77 > and against any 

strategy r of Nature: 



< 2 



and P^^r ( sup d£{g^) > rj \ < — 

\m>n J Tj n 



(3) 



where kq 



suPa.fe \\9[a,t 



BlackweU [9] and Mertens, Sorin & Zamir |55| obtained respectively the bounds in expec- 
tation and in probability. The very definition of \\g\\oo allows each g{a,b) to be random 
variables with bounded second moment. 

We propose in the following Corollarv ll.il a slight variant that improves the constants 
(in the deterministic case or when £ is compact); for instance, they are divided by two 
if (5 = {0}, as in Section [1.3.61 

Corollary 1.1 A closed set S is approachable if and only if £g := £ r\ co {(/(a, 6) ; a G 
.4, 6 G is also approachable. Blackwell's strategy applied to £g ensures that 



deigj < \ - and Va^r sup de{g^) >1]]< , 



m>n 



2 K 



r/^ n 



where k 



+ ||<fg||)^ and \\£g\\ := sup ; z G <Sg| is smaller than \\g\\ 



Proof: An approachability strategy of £ ensures that any accumulation point of ^„ must 
belong to both the closed set £ and to the compact set co {g{a, 6) ; a G .4, 6 G B^, hence 
to £g. Reciprocally, any approachability strategy of £g approaches its super-set £. 

Let a be Blackwell's strategy applied to £g, define 6n ■= d^i^^ and denote by 7r.„ 
any element of 11^ given by Equation Q. Definition of ds implies that 



^n+l — \\9n+l 



n 



TTr, 



n + 1 



n + 1 



idn+l - TTn) 



1 



2 ^ 1 1 2 



2n 



[n + Ij^ [n + Ij^ 



(n+l)2 



{9n 
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Conditioning on the finite history h"^ and using Equation ([2]) as well as the definitions 
of ||g||oo and \\£g\\, the last inequality becomes 



< 



n e2 , (ll^lloo + ll-fgl 



(n+l)2 " (n + l)2 



and, with a simple induction, Eo-,T[<^n] — t^/n. Thus (7„ converges in probability towards 
£. The almost sure convergence is a consequence of the facts that 



oo i| ||2 



.k=n 



is a supermartingale and E(j,r[^n] ^ 



2k 



Indeed, Doobs' inequality (see Neveu [5Bj, prop. IV. 5. 2) implies then that 
lP<7,T(3m > h s.t. Z„ > ?7 ) < — < 



which gives the result. 



Blackwell's strategy depends only on the sequence {gnjnew so these results do not 
require the finiteness of B or nor that Nature's actions are observed. In fact, we could 
as well assume the following model that we call the compact case (in opposition to the 
finite case). 

Action sets are compact and convex sets, denoted hy X C M,^ and U C (IR*^) . At 

stage n G M, Nature chooses an outcome Un = {Un)a£A ^ (^R-'^)"^ in ^ and the player 
chooses G A". Those choices incur th.G vector payoff — Xn. Un G IR'^, the standard 
inner product between Xn and Un- Condition ^ that defines i?-set becomes then 

Vz G E'^, Bvr G Ugiz), inf sup {x.U - tt, z - tt) < 0. 

It is also possible to incorporate randomness in this model. The compact and convex 
sets X and U can be sets of probability distribution (this was the case when X = A(^)) 
and in that case x.U is the expectation of a random payoff associated with x and U (that 
must have a second moment). 

1.1.2 Equivalent formulations and necessary condition 

Blackwell defined geometrically a i?-set from outside. As Soulaimani, Quincampoix &: 
Sorin [2] noticed that it can also be defined similarly from inside. Informally, one can 
interpret these definitions slightly differently: instead of viewing approachability as the 
convergence of average payoffs to iS, it can be understood as preventing average payoffs 
from escaping £. 

First, we need to recall the notion of proximal normals to £. 
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Definition 1.3 The set of normal proximal to some closed set E C K, at e G £ is 
denoted by NPs{e) C R'' and is defined by: 

NPeie) := {pGlR^ deie + p) = ||p||} = {p G B{e + p,\\p\\) n £ = 9], 

where B(^e + p, \\p\\) is the open ball of center e+ p and radius \\p\\. 

The equivalent definition of a i?-set, which is closely related to the notion of discrim- 
inant set in differential games, is given by the following lemma whose proof is immediate 
and omitted. 

Lemma 1.2 A set 8 is a B-set if and only if: 

\/e £ £, Vp G NP£{e), min max {p,g{x,y) — e) < 0. (4) 

Interesting results on a slightly different (but equivalent as we shall see) notion of 
approachability that can be found in the literature can be easily derived from this alter- 
native definition of B-set. 

Definition 1.4 Given e > 0, a closed set £ C R*^ is e- approachable by the player if he 
has a strategy as ensuring that, after some stage £ IN, no matter the strategy r of 
Nature, 

sup E^^^ridsigj] < e and P<^^,^ sup d£{g^) >e\<£. (5) 

n>Ne ^ ' \n>Ne J 

And a set £ is 0- approachable if it is e- approachable for every e > 0. 

The difference between approachability and e- approachability is wether the strategy 
can depend on e or not. It is clear that an approachable set is 0-approachable but 
the converse is not immediate. It is easier to show - following Spinat |73| and thanks 
to Lemma 11.31 ~ that a 0-approachable set must contain a i?-set and so both notions 
coincide. 

Lemma 1.3 Let {£n}n£¥i be a decreasing sequence of compact non-empty 0-approachable 
sets, then £oo '■= n„g]Nfri is also a compact non-empty 0-approachable set. 

Proof: One just has to notice that, for every e > 0, the e/2 neighborhood of £^0 is 
included in some £n which is e/2- approachable. And an e/2- approachability strategy of 
£n will e- approach £. ■ 



Lemma 11.31 is not trivially true for approachabilit}0. Indeed, one must find an ap- 
proachability strategy that is independent of e and a simple concatenation of those 
might not work (except in the specific case of convex sets) . 



^S. Manner pointed out this interesting property. 
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Proposition 1.4 // a closed set £ is 0- approachable, it contains a B-set. 

We only provide a sketch of the proof, complete details can be found in Spinat |73) . 

Proof: Consider the family of every compact subset of £ that are 0- approachable. 
It is a non-empty family, ordered by inclusion and, because of Lemma 11.31 every fully 
ordered subset has a minorant (the intersection of all elements of this subset) which 
belongs to this family. Thus Zorn's lemma yield that a minimal element ifoo exists and 
we claim that £oa is a i?-set. 

Indeed, assume the converse: condition @ does not hold for some e G £oa and some 
proximal normal p G NP£^{e). So there exists G A(B) such that 

0< min max (p, g(x,y)—e) = max min (p, g(x,y)—e) =: min (p, g(x,yQ)—e). 
xeA{A) yeA{B) ^ yeA{B)xeA{A) xeA{A) 

In particular. Definition 11.31 of proximal normals implies that, at least for some small 
A G (0, 1), (1 — A)e + Xg{x, yo) belongs, for every x G A(^), to B {e + p, \\p\\). Therefore, 



By continuity. Equation ([6]) holds (up to 5/2 instead of 6) on a small open neighborhood 
V of e. We shall prove that this implies that £oo\V is still 0-approachable ; it is a 
contradiction with the minimality of i^oo which must therefore be a i?-set. 

Assume that at some stage n G IN, ^„ belongs to V and that Nature plays repeatedly 
accordingly to yo after. Then if n is large enough, there exists some large m G IN such 
that and 'gn+m ^^^^ respectively and with arbitrarily high probability, arbitrarily 

close to A and to some (1 — A)^„ + Xg{x,yQ), which is at 6/2 from £oo- 

Consider a (5/4-approachability strategy of £oo denoted by a^/i- For some large 
G M independent of r, the Pg-^^^ ^^-probability that g^ belongs to V for some n > N 
must therefore be smaller than J/4. In particular, this implies that ^„ stays within 5 
of £oo\y with probability greater than 1 — 6. Thus, for every 6 > 0, there exists a 
(5-approachability strategy of £oo\V. ■ 

A direct consequence of Theorem 11.11 and Proposition 11.41 is the characterization of 
approachable sets. 

Theorem 1.2 A closed set £ is approachable if and only if it contains a B-set. 
1.2 Specific case of convex sets 

In the specific case of convex sets, there exists a dual and complete characterization of 
approachability and excludability due to Blackwell [9]. It is somehow a consequence of 
the fact that, for any z in some closed and convex set C C 1R°' one has: 



in particular this implies that NPc{z) is a cone, referred to as the normal cone. 
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1.2.1 Complete characterization of approachable convex set. 

Theorem 1.3 A closed and convex set C C R'^ is approachable by the player if and only 
if: 

yyeA{B),3xeA{A), g{x,y)eC. (8) 
And a convex set is either approachable by the player or excludable by Nature. 

Proof: Let C C E"' be a convex set and p G R"' be a normal proximal of C at some 
z € C. Because of Property ([7]), Condition ([8]) can be immediately rewritten into 

max min (p, q(x, y) — z) < 0. 

The mapping (x, y) i— t- (p, g{x^ y) — z) is linear in both of its argument, so von Neumann 
minmax theorem implies that operator min and max can be switched, i.e., 

min max (p, q(x, y) — z) = max min (p, q(x, v) — z) < 0, (9) 

thus C is a i?-set and is approachable by the player. 

On the contrary, if Condition ([8]) is not satisfied, there exists some yo £ ^(-S) such 
that g{x,yQ) C for every x G A(^). By continuity, there exists 6 > such that 
dc{g{x,yQ)) > 6. If Nature plays repeatedly accordingly to yo, then the law of large 
numbers implies that g„ converges uniformly to the set of {g{x,yQ), x G A(^)} which 
is included in the complement of . So C is excludable by Nature and, of course, is not 
approachable by the player. ■ 



Proof of Theorem 1 1 . 3 1 relies on the Hilbertian structure of M!^. However, using differ- 
ent arguments, it can be generalized to any normed space, see Theorem 11.71 

Remark 1.1 In the specific case of a convex set, Blackwell strategy at stage n + 1 E IN 
can be decomposed as follows: 

i) Given g^ G M,'^, compute its projection nc(^,„) on the closed and convex set C; 

ii) Solve the projected zero-sum game defined by Equation i.e., find Xn+i G A(^) 
that minimizes this problem and choose a„+i accordingly to it. 

These steps ensure that Xn+i = xi^^) as introduced in Definition M.SX So Blackwell strat- 
egy reduces to a projection onto a convex set and the resolution of some linear program 
(solving a zero-sum game can be reduced to the latter, see Sarin f7(J^ . appendix A). 

On the other hand, checking wether a convex set is approachable or not, i.e., if it 
satisfies Condition <^ (or equivalently the more complicate Condition ^) is NP-hard, 
even with C = {0}. Mannor & Tsilikis ]5S^ has indeed reduced this to the 3- SAT problem. 

In the compact case where action set are A:" C and ^ C ([0, l]"^) , a closed convex 
C C R*^ is approachable if and only if 

yU eU, 3xe A{A), x.U eC. 
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1.2.2 Sharper high probabiUty bounds 



In this section, we use the convexity of C to exhibit high probability bounds improving 
Corollary 11.11 

Corollary 1.5 If C C R*^ is a closed and convex approachable set, Blackwell strategy 
ensures that for every r] > and against any strategy r of Nature : 



Fcr^r ( sup dc{g„ 

\m>n 



2||5ll 



> 1] ] < 4exp 



2 

r] n 



/m J \ 32||(7 

Proof: Distance to a convex set is Lipschitz and convex, so 

sup dc(9m) - 

m>n 



(10) 



m 



C \ sup E 

^ m>n 



C ■{ sup 

. m>n 



9m - E[ff^ 





9m 




+ 


9m 


^[9m] 


m] 







2||5ll 



/m 

21151100 



m 



> Tj 

> rj 



where the third inclusion is a consequence of the rate of convergence of Blackwell strategy. 
We conclude using Lemma [5.31 I 



This result must be put in perspective with Corollarv ll . 1 I that states that, for any arbi- 
trary approachable set £ and every rj > 0, I^a,T (sup^>„ ^£-(5^) > r?) < {rj'^n/8\\g\\'^)~^ . 

1.2.3 Biased approachability 

We assume in this section that the closed and convex set C C M!^ is not approachable by 
the player. In that case, the natural extension of Blackwell strategy would be defined by 
(t(/i") = Xn+i G ^(-^)j where Xn is optimal in the projected zero-sum game with payoffs 

{g{x,y) - Uc{gJ,gn - ^c{9n))- 



Corollary 1.6 Even if a closed and convex set C C M!^ is not approachable by the player, 
Blackwell' s strategy a ensures that 



Ea,T dc (gn)-^ 



< 



H — 1=, where 6 = sup inf dc{g{x,y). 
n ^Jn yeA(B) ^'GAC^) 



Proof: We only need to prove that a is in fact exactly Blackwell's approachability strat- 
egy of the closure of (the (5-neighborhood of C) which is by definition and Condition ([8]) 
approachable. This is simply due to the fact that: 



|2-nc(^)| 
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Indeed, = C + SB{0,1), 3011^5(2;) minimizes (c+(5e)||2 = ||z-cp -25(z-c, e) + (5^ 
over (c, e) G CxB{0, 1). And necessarily, one must have e = (z— c)/||z— c|| and c = 11^(2;). 
The results follows from the fact that dQs{z) < dc{z) + 5 and HC^H < ||C|| +6. ■ 



The key ingredient of Corollary 11.61 is not the rates of convergence (which are a direct 
consequence of the fact that is approachable), but the fact that it does not require 
the computation of 6 and (we recall that determining if a convex set is NP-hard, thus 
determining the smallest approachable extension is even more complex). Notice that if 
C is approachable, rates of Condition [8] and Corollary 11.61 and of Theorem 11.11 match. 

This result has to be put in perspective with the following proposition that also deals 
with biased approachability, yet on different level. 

Proposition 1.7 Assume that player and Nature strategies generates a sequence of pay- 
offs such that, at every stage n, 

{9n - 7r£:(5n)jIE^,r [S'n+il/i"] " vr£-(g„)) < en, 
for some sequence £n- Then 



dsig 



< — I — and 

n [n + 1)^ 



P.,, sup dsig^,) >v]< V"^" ■ 

\m>n ) V n 

In particular, if En converges to 0, then ^„ converges in expectation to E; the conver- 
gence is almost sure as soon as '^^e'M ^ 

Proof: The proof is identical to the one of Corollary 11.11 ■ 



Actually, the result is stated for arbitrary sets and holds for non-deterministic se- 
quences oi £n- On the other hand, for convex sets, concentration inequalities introduced 
in the previous section show that 

P,.{ sup - ?gf^ > ,} < 3exp ( - 

thus ^„ converges almost surely to C as soon as e„ goes (in expectation) to 0. 
1.3 Generalizations and extensions 

1.3.1 Deterministic approachability and procedures in law^ 

As mentioned in Section ri.l.l| Blackwell's approachability strategy does not use the fact 
that actions chosen by Nature are observed, as it is only required to observe the sequence 
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of payoffs. In fact, it is not even required that the random variable gn = g{anibn) is 
perfectly observed. 

Indeed, denote by 7„ the observation made after stage n, and assume it is equal to 
either g{xn,bn) or g{xn,yn), where x„ and y„ are mixed action of stage n (i.e., laws of 
ttn or 6„). Blackwell's strategy applied to the sequence of 7„ ensures that the sequence 
of deterministic averages 7„ converges to £, uniformly with respect to Nature's strategy. 

To conclude that this describes an approachability strategy, it remains to notice that 
deidn) — '^£(7n) + ||5n~7n|| that the norm of 5^ — 7^ converges almost surely to zero, 
because it is an average of bounded martingale differences (using classical concentration 
arguments to get rates of convergence independent of strategies). 



1.3.2 Approachability in infinite dimension spaces 



We assume in this section that g no longer takes value in some Euclidian space. Formally, 
there exists a probability space (fJ, /i, J^) such that, for every a £ A and b E B, g{a, b) G 
L2(f^,/i,-F) - g is extended to A(^) x A{B) as before. The finite case can be easily 
embedded into this framework by defining, Q, = {1, . . . ,d} and = ^ Ylk=i ^k- 

In this context, notions of approachability slightly differ, as the uniform convergence 
with respect to Nature's strategy is not required: 

Definition 1.5 A closed set £ C R*^ is approachable by the player if he has a strategy 
a ensuring that, no matter the strategy r of Nature, g^ converges ^-almost surely to £, 
for P^ r-O'lfnost every histories. 

A set £ is excludable by Nature if she can approach the complement of £^ for some 
(5 > 0. 



Lehrer |42j has proved that the natural inner product of L2{^, fi, J-) allows to extend 
the definition of i?-sets and Blackwell's characterization of approachable convex sets still 
holds (Equation ([8]), in the previous section). 



Theorem 1.4 A closed convex set C is approachable if and only if 

yy£A{B),3xeA{A), g{x,y)GC. 
The proof relies on the following geometric principle, adapted from Lehrer |42| . 



Lemma 1.8 Let C be a closed convex subset of L2{^, H,J-). If, for every n G IN, (7„ 
is bounded fi-as by M G ^2(1^,//, J") and {g^ - Ilc{gn),9n+i - ^{9n)) < , then g^ 
converges fj,-as to C. 

Proof: Let us denote fn = 9n ~ ^c{9n)- The finite dimensional arguments of the proof 
of CoroUarv 11.11 implv that /„ < 2||M||/-^/n thus ^„ converges in probability to C. 
The almost sure convergence is a consequence of the fact that 



fn+l fn 



< 



9n+i-9n) - nc(g„+i) -nc(5„; 



< 2 



9n+l ~ 9n 



< 



4\\M\\ 
n+1 
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so /„ has small increments and we conclude using the technical Lemma |5.4[ ■ 
Convexity of C is only used to get a Lipschitzian projection. 

Proof of Theorem II. 4t Every arguments behind the proof of Theorem 11.31 hold 
in L2(ri, /U, J-"). Therefore, a closed convex set satisfying Blackwell condition remains a 
B-set with respect to the natural inner product of L2{0,, fi,!^). 

Assume that C is a i?-set and consider Blackwell's strategy, denoted as usual by a 
(and T is Nature's strategy). Let fi Cg) Po-,r be the product measure on il. x Ti on which 
we define the random variable Qn by gn[oJ,h] = ^„(a„,, 6„)[a;] where {an,bn) is the pair 
of actions played at stage n accordingly to h. Since A and B are finite, Qn and Qn are 
uniformly bounded and the sequence gn satisfies the geometric principle. 

As a consequence, gn converges Po-^r-as to C which is therefore approachable. ■ 



1.3.3 Approachability with infinite action space — non-linear approachability 

It is also possible to generalize the previous results when actions spaces are not necessarily 
finite but two subsets of a given topological space, denoted by X and y. Payoff mapping 
g is now a function from X x y into L2{^1, fi,J-)- In particular, it is not required in this 
section that g is linear in each of its variable. 

Theorem 1.5 Assume the following regularity assumptions on g: 

a) there exists M G L2{0,, fj,,J^) such that g(x,y) < M , fi-as, for every x,y £ X x y ; 

b) for every y £ y, Q{y), the closure of {g{x,y),x £ X}, is a compact and convex set. 

c) for every u G L2{0,, fj,,J^) such that sup^g0(c, u) < +oo, the zero-sum game with 
payoffs defined by {u, g{x,y)) has a value. 

Then it holds that 

i) Blackwell's characterization of convex approachable set holds : 

C is approachable (in pure strategy) if and only if Vy G 3^, G{y) DC ^ f}>; 

a) C is approachable if and only if for every z G L2{0,, fj,,J^): 

sup inf (z - Uc{z),g{x,y) - Uc{z)) < 0. 

Hi) If there exists yg such that G{yo) H C = 0, then C is excludable by Nature; 



14 



Proof: The deterministic approachability strategy associated with Blackwell's charac- 
terization is defined as follows. Denote as before by G L2(r2,/i,-F) the average payoff 
up to stage n. Since He is the projection onto a convex set, one has 

sup {c,g^-Uc{gJ,) < (nc(g„),5n - nc(5n)> < +00. 

Assumption c) ensures that the game with payoff {g{x,y) — nc(^„),^„ — nc(^„)) has a 
value which is, using Blackwell characterization, less or equal than 0. The approachability 
strategy consists in playing Xn G Af, any 2~'^-optimal strategy of the latter game, i.e., 

sup {g{Xn,y) -nc(5„),9n-nc(9n)> < ^■ 

The fact that this describes an approachability strategy follows from arguments used in 
the proof of Corollary 11.11 and technical Lemma 15.41 

Assume that Blackwell's condition does not hold, i.e., there exists yo such that G{yo)r] 
C = 0; Nature, by playing repeatedly i/q, can ensure that belongs to G{yo)- The 
intersection between the closed convex set C and the compact convex set G{yo) is empty, 
so they can be strictly separated. Since Nature can approach Q{yo), C is excludable, thus 
not approachable. ■ 



Assumption is required to get point in). Second conditions of i) and ii) are 
sufficient for approachability (but not necessary). 

When actions sets A and B are finite, the projected game with payoff {g{a,b),u) 
typically does not have a value for some u G L2{^, J-); so we considered instead mixed 
actions and strategies. This can be generalized when actions space are two measurable 
sets {A, A) and {B,B), using the same tools as for procedures in law, see Section [1.3.11 

Denote by Af and y the sets of probability distributions onto {A, A) and {B,B), 
embedded with the weak-:*: topology ; the mapping g is extended to X xy multi-linearly 
as usual. Then, under mild assumptions (for example if A and B are compact and g is 
continuous, see e.g. Sorin |7U|). the projected game with payoff {g(x,y),u) has a value 
(at least for every u such that sup^g^ (c,ti) < -|-oo). So C is approachable with respect 
to action sets X and y. In particular, there exists an approachability strategy such that 
the averages of observed payoffs 7„ = g{xn,bn), where x„ G is the action dictated to 
be played at stage n, converge to C - and the rate of convergence is O {l/^/n). 

Similarly to Section [1.3. 11 this is an approachability strategy of C since gn~ln again 
an average of bounded Martingale differences, and concentration inequalities of sums of 
bounded martingales differences in any Hilbert spaces, see e.g. Chen &; White |15| . imply 
that, in expectation and with great probability, ^„ — 7„ 
convergence is again a consequence of Lemma 15.4 

1.3.4 Approachability with activation 

This section is concerned with the case where only a fragment of all coordinates of the 
payoff vector (belonging to L2{i^, ^, -F)) are active at each stage. Formally, there exists a 



< Oil/^/nj. Almost sure 
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mapping X : H ^ L2{i^, /i, J-) such that, after any finite history /i" = (ai ,bi, . . . , a„,, 6„), 
<^[/i"] G L2{0, fi,J-) has value in {0, 1} and only the coordinates to £ Q with A'[/i"](a;) = 
1 are active. In particular, wether a coordinate is active at a stage might depend on 
choices of actions of this specific stage. We also assume that increases ^u-almost 

surely to infinity, no matter the pair of strategies. 

In this framework, we denote tilted averages of payoffs by 

Qv „ '■= ^™=i ^ ]ff( mi m) ('■^Yith the convention that — = 0). 

A set £ C ^2(^2, /X, is approachable if the player has a strategy a such that, for any 
strategy r of Nature, the sequence 'g^ n ~ ^siVxn) converges to zero //-almost surely, 
for Po-,r-cilmost all infinite histories. 

We will only focus on product sets, that can be described by 

C = |/ G -^2(17, fo< f on 0,0 and / < /i on | 

where and f^i are two measurable subsets of 0, and /o, fi G ^2(^1 A*) -^)- The following 
theorem shows that, in this specific framework, a notion of tilted B-set is sufficient for 
approachability 

Theorem 1.6 Let C C L2{0,, fM,J^) be a product set. Then any strategy a such that, for 
any strategy t of Nature, and for P^j^r-o-l'rnost every infinite history. 



Tin 



9x,n-'^cigx,n)^:9{Xn+l,yn+l) -nc(9A',n.)^ < 0, 



where Xn+i = (j{h^) and y„,+i = T{h^), is an approachability strategy of C. 

The proof is similar to the one of Theorem 11.41 except that Lemma 15.51 is used instead 
of Lemma 11.81 so it is omitted. 



The next proposition shows that approachability with activation of a product set 
C = Ilfc=i Euclidian spaces can actually be reduced to usual approachability. 

The only condition is that activation at stage n depends only of current actions (i.e., 
= X{an,bn) where X{a,h) might be a random variable); we also assume, without 
loss of generality, that the origin belongs to C and even that C = 11^=1 [Oj ^ 

Proposition 1.9 A product set C C is approachable with activation depending only 
on current actions if and only if the following convex set 



C := \{z,uj) eR"^ xRi;( ^\ G c| 



\ ^ 

G C > with the convention that — = 

ke{i,...,d} ^ 



is approachable in the game with payoffs defined by 

gx{a,b) = (^gx{a,b),X{a,b)^ G R'^xR^, where gx{a,b) = {X^{a,b)g^{a,b))j^^^^^ ^^y 
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Moreover, there exists a strategy such that, in expectation. 



dc {gx,n) < 



4|b 



>0 . 



re 



Proof: Consider any fixed {z,uj) G R*^ x M!^; we can always assume that every coordinates 
of uj are non equal to 0. Indeed, since C is a product set, dc (y{z^ /ijJ^)k) = dc ((z^'/ti)^)fc) 
where {z^,u)'') = {z^^oj'^) if w'^ / and {z^,u}^) = (c^, 1) with arbitrarily chosen in 
C*^ if w*^ = 0. 

Define {ze,u}e) G 11^(2,0;^ and ^ the smallest coordinate of a;. Since C = 11^=1 [O'^'^]' 
then C = Y[k=i |(-z'^,w'^);0 < z^' < b'^oj'^^, thus necessarily 

fcllfflloo + 1 



b'' + 1 



< (llfflloo + 1)0;" 



As a consequence. 



Reciprocally, 



£ _ £e 

a; cjf, 



< — Lz — Ze\\ + U sup 



00^ UJ^ 



<2|b| 



UJ 



d^{z,u) < {z,u:) - y^LoIicy— j j ^,oj 



< 



z „ ( z 
--He - 



Finally, if C is a product set containing 0, then C is a convex cone. The result is a 
consequence of Blackwell's characterization of approachable sets. ■ 



Assuming that the origin belongs to the product set C is of course non-restrictive, 
one can always choose to transform the origin into any point. Moreover, in some cases, 
product set property can be relaxed. For instance, if there exists two coordinates £ and 
£' that are always active together, i.e., if X{a,bY = X{a,bY for every pair {a,b), then 
the results holds if C := nA,-^{^ £'} ^ ^^'^ where the convex set C^'^ C does not need 
to be a product set. 

1.3.5 Variable stage duration 

Cesaro averages of payoffs are considered in the usual definition of approachability. In 
this section, we make the implicit assumption that all stages does not have the same 
weights (when computing averages) or, equivalently, that they do not have the same 
length duration: payoffs obtained on long stages must have more importance than on 
short stages. We distinguish two classes of variable and random stage duration: wether 
they depend or not on the actions chosen. 

Assume for the moment that Un, the maybe random length (or weight) of the re-th 
stage, is independent of actions chosen by player and Nature. In this context, a is an 
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approachability strategy of a closed set E if 



En 



lyn 



ojm converges 



to £, P(j,r-almost surely, uniformly with respect to the strategy r of Nature. It will be 
convenient to define f]„ = ^m=i'^m- 

Proposition 1.10 Let £ C P'^ be a closed B-set. Then BlackwelVs strategy applied to 
the sequence of weighted averages g^^ „ ensures that for every n G IN and r] > 



de {guj,n) 



< 



lyn 



OH 



and 



P 



a,T|3m >n,de {g^^m) > < 



En 
rn 



+ 



E 

k=n+l 



K 



The proof is absolutely identical with Cesaro averages (when a;„ = 1 for every n G IN) 
thus omited. In particular, for any polynomial weights, i.e. if = n" with a > — 1, a 
i?-set is approachable at the rate of convergence of 0(l/-yn), which is independent of 
a - only the constant depends on a, see e.g. Mannor, Perchet & Stoltz |52| . 

In fact, as we shall see in the following Section [T3II1 a B-set is approachable as soon 
as the usual Robbins- Monroe assumptions are satisfied almost surely: 



E--^ = +00 and 



new 



We now turn to the case where a stage length might depend on the actions of the 
player and Nature. For simplicity, we assume that there exists a mapping : Ax B ^ 
[a;,c<7] C (0,1] such that a;„ := uj{an,bn)- Approachability in this framework can be 
reduced to regular approachability, similarly to what has be done with activation. 

Proposition 1.11 A closed set £ C P'^ is approachable with respect to weighted averages 
if and only if the following cone £ is approachable with Cesaro averages 



£ = \ {z,u}) e'K'^ X [a;,cJ] ; - e £ 

UJ 



}■ 



Moreover, if £ is convex then £ is also convex, thus £ is approachable with respect to 
weighted averages if and only if 



Vy G A{B), 3x£ A{A), 



w, 


uj{a,b)g{a,b) 




uj{a, b) 





G £ . 



Proof: Let {z,u}) G P'^ x [ui,u}] and {Zf.,0Je) G Il^(^z,uj], then 



de 





Z Zf, 


1 






< — 






UJ 



\Z — Zp 



1 1 

UJ Ue 



< 



+ 



UJ 



UJ^ 



da{z, uj) 
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As before, one has reciprocally, 



< UJ 



If £ is convex, then £^ is a convex cone and the characterization of approachable convex 
set (in this framework due to Manner and Shimkin |49) ) is a simple consequence of 
Condition (El). ■ 



1.3.6 Unbounded payoffs and strong law of large numbers 



At the end of Section ll.l.ll we noticed that we can assume that Nature choose outcomes 
U in some given compact set U C (R j such that the player's payoff is, in expectation, 
x.U. The fact that ^ is a fixed compact set can be weakened (similarly to Stoltz [74)). 
and we can assume that Un belongs to Un C (R ) as long as 

|2 ^ rn ,|21 



El I 1 1 
(n + l)2 

new ' 



< oo or even 



n6lN 



\an-Un\ 



(n + 1) 



< oo , 



with a convergence uniform with respect to Nature's strategy. 

Indeed, under this assumption, the proof of Theorem 11.11 does not change, i.e., if 
condition ([2]) is satisfied at every stage, then the same arguments yield that 



< 2- 



and Zn is a supermartingale such that 



la U IP 

I'^m-'^ m\\ 



+ \\£a\\' 



Zn 



< 4-! 



\\£n 



+ 2 



Ur, 



+ 



n 



oo 

E 

m=n+l 



lEa-,r 



Urr 



By assumption, every terms goes to zero uniformly with respect to Nature's strategy, 
hence £ is approachable. 

This sheds new lights on approachability theory: it can be seen as a generalization of 
Kolmogorov strong law of large numbers (see Feller |20| . chapter X.7 and Mertens, Sorin 
& Zamir |55) . exercice 4, page 104). Indeed, Let {Xn}n£]M be a sequence of independent 
random variable in R*^ and define Vn '■= IE — E[X„]||^ . As soon as X^^gj^ Vn/n^ is 



bounded, 



E 



converges almost-surely to 0; moreover 



3 771 > n; 



Xr 



MXr 



> 



1 / Sm=l ^r) 



+ E ^ 

m=n+l 
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or even with an exponential decay (since {0} is convex, see Section [1.2.2p . 

Finally, the approachability bound (in expectation) matches the optimal bound in 
the law of large number and thus is in some sense optimal. Indeed, if Xn is an i.i.d. 
sequence such that X„ = ±1 with probability 1/2, then by denoting £ = {0}, one has 



E 



\X f 



1 



n 



K 

n 



1.3.7 Bounded memory 

Blackwell's approachability strategy does not require to know at each stage the whole 
sequence of past payoffs, but only the current average. Nonetheless, to update this 
average either stage number of the complete history must be kept in memory which takes 
of course an increasing required size of memory. This is why the question of wether it is 
possible to approach a closed set £ using simpler strategies, for example implementable 
by a finite automata or with a finite memory, arises. 

A strategy a has a bounded memory of size M G IN if, for every finite history /i" G i?„, 
<7(/i") depends only on {^an-M+i, bn-M+i, • • • , o-n, ^n) , i-e. the last M profiles of actions 
played. Lehrer &: Solan |441 H6] proved that an approachable convex set C remains 
approachable by a player if it is restricted to use strategies with a bounded memory of 
size M S IN; indeed, the average payoff converges to some 0(l/\/M)-neighborhood of C. 

The basic idea is relatively natural; play Blackwell's strategy on a block of size M, 
then erase the memory and start over. It is only necessary to encode the beginning (and 
the end) of a block, but this can be done using \/M stages, for examples by playing 
always the same action and by ensuring that no such sequence appears in the same bloc. 
The average payoff on each block will be 1/\/M close to C which is convex, hence the 
overall average payoff is also 1/-v/M of C. 

On the other hand, Zapechelnyuk |82) considered the strategy with bounded memory 
directly adapted from Blackwell's, that is defined by (t(/i") = x (5*^), where x{-) is given 
by the definition of a i?-set and is the average payoff on the last M stages. For 
instance, we are interested by this strategy in the game where payoffs of player (that 
chooses a row) are given by the following matrix: 



L R 



(0,-1) 


(0,1) 


(1,0) 


(-1,0) 



and C = B,?_ . For M big enough, there exists a strategy of Nature such that the sequence 
(^n^)nelN ^^t^'^^ ^ cycle (of length either 2M or 2M + 2). Roughly speaking, this latter 
is defined by four successive blocks of lengths M/2 (or M/2 + 1) where within a block, 
the same pair of actions (except on at most one stage) is played. And one can show that 
the order of these actions is (T, L), (T, i?), {B,R) et {B,L). 

At the end of the blocks {B,R) and {T,L), g^^ is close, respectively to (—1/2, 1/2) 
or (1/2, —1/2). So it is at a distance of around 1/2 from C, and the sequence {gn)n&JN 
of averages of payoffs on the M last stages does not converge to C. 
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However, nothing indicates wether the sequence does or does not converge to C 
(which is the case in this example). 

1.4 Alternative techniques and proofs of approachability 
1.4.1 Approachability in continuous time 

Benai'm, Hofbauer & Sorin [7] noticed that Blackwell's approachabiUty strategy of a 
B-set £ satisfies the following recurrence relation: condionnaly to /i", 

where T(2;) = \^u} & W^-jWoo]] < \\g\\oo and 3p € Il£{z), {z — p,uj — p) < 0^ . Therefore, 
the sequence of averages payoff {g^jnew is a Discrete Stochasitic Approximation (a 
DSA for short) of g, solution of the associated ordinary differential inclusion 

g G r(g) - g, g(0) = go G E^. 

The derivative of the mapping 6{t) = (i^(g(t)) satisfies S'{t) < —26{t)/t thus it is a 
Lyapounov function and 6{t) < 6{0)t~'^. As a consequence, g converges to £ and, as an 
DSA, the sequence {g^jnelN converges a.s. to £. However, rates of convergence of DSA 
are usually not explicit and might not be uniform. 

To circumvent this issue, one might consider procedures in law, as defined in Sec- 
tion 11.3.11 that are deterministic and thus can be represented as an Euler Scheme of the 
associated ordinary differential inclusion. They might provide explicit rates as the differ- 
ence between the average payoff and its expectation converges to zero, and is controlled 
by concentration inequalities (see Sorin |72) or Kwon |40|). 

As Soulaimani, Quincampoix &: Sorin [2] have considered an auxiliary differential 
game D where control spaces of the player and Nature are respectively X = A(^) and 
y = A(B)) and the game dynamic is given by: 

4gW = -g") + f g(o) = o. 

at t 

The intuition is that g(t) = j g{x{s),y{s))ds is the average payoff at time t. The 
change of variables t = and g(s) = g(e^) modifies the dynamic into 

^g(s) = -g(s) + g{x{s),y{s)) := /(g(s), x{s),y{s)), g(0) = g(l). 

This transformation proves the characterization of a i?-set given in Equation Indeed, 
a set £ is approachable if the player can force the dynamic to stay within £. Therefore, 
a closed set S is a i?-set if and only if it is a discriminating domain for the player with 
respect to the dynamic /, i.e. if 

Vp G C,Vg G NCc(p),sup inf {f{p,x,y),q) < 0. 
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1.4.2 Information-based strategies 



Blackwell's strategy is a payoff-based strategy as the running relevant state variable is 
the sequence of average payoffs. We develop in this section a conceptually completely 
different kind of strategy based on the sequence of observed profile of action played, as 
in Perchet & Quincampoix |63| or Mannor, Perchet &: Stoltz [51 J. 

The basic idea follows from the following simple fact. Define On = Sa^,b„ S A{Ax B) 
as the Dirac mass on {an,bn) G A x B and let 9n = X]m=i ^m/n be their average. By 
definition, = Eg J(7(a, h)] belongs to 8 if and only if On belongs to the following set 

£:=\0g A{A X B) s.t. Ee[5(a,5)] G c x B). 

If £ is closed and convex, then £ (seen as a subset of H^^^) is also closed and convex; 
it remains to compare distance between £ and £. 

Lemma 1.12 There exists 7 > such that, for any probability measure 9 G A(^ x B) 
and any set £ 

idJo) <d£(Ee[gia,b)]) < \\g\\^VABd^( 



Proof: For any 6 G A(^ x B), define g{0) = Ee[g{a, b)]. Let G U?r{0), so g{0) G £ and 



d£{giO)) < g{0)-g{0_) 



Y,(0ia,b)-9ia,b))gia,b) 



a.b 

< ||5l|oo||^-^||i < ||g||oo\^||^-^||2 

This gives the second inequality. 

For the first inequality, notice that g : A{A x B) C IR"^^^ — )■ co{g(a,b)} is a linear 
mapping, so its inverse g~^ is piecewise linear thus Lipschitz, see e.g., Billera Sz Sturm- 
fels [8], bottom of page 530, or Walkup & Wets |81) . As a consequence, there exists A > 
such that for every z,z' G co{g{a,b)} and any points 9 such that g{0) = z, there exists 
0' such that g{0') = z! and ||6' - 0'\ < X\\z - z'\\. In particular, for every G A(^ x B), 
if z G n^: {g{0)) there exists such that g{0) = z, thus 0££ and 



< A 



9(0) - 9(0) 



A 



giO)-U£igiO)) =Xde{g{0)) 



and one just has to take 7 = 1/A. 



The consequence of this lemma is that an approachability strategy for £ is an ap- 
proachability strategy for £ (and reciprocally); apart from the requirement to compute 
£, only constants in rates of convergence deteriorate. 

The main advantage of this new kind of algorithms is that they do not rely on the 
observed sequences of payoffs. For example, consider the cases where payoffs are not 



22 



vectors in some Euclidian space but in some arbitrarily normed space (or even payoffs 
can be subsets of this space). If the image space is not Hilbertian, then Blackwell's 
proofs do no longer hold; on the other hand, the transformation sequences of payoff into 
sequences of profile of action remains true. Therefore, we get this very general version 
of characterization of approachable convex sets. 

Theorem 1.7 Let ^H,AA(-)^ be any normed space (not necessarily Hilbertian) and g : 

A{A) X A{B) — >-H (or g : A X B ^ H is A and B are some compact convex sets) any 
continuous bi-linear mapping. Then Blackwell's characterization of approachable convex 
sets holds: 

A convex set C C H is approachable if and only if\/y G A{B),3x € A{A), g{x,y) G C. 

The result is already proved if A and B are finite. If they are some compact convex 
sets and g is continuous, then one can discretize them to get e-approachability strategy. 
Since C is convex, they can be concatenate into an approachability strategy (using the 
doubling trick). 

From the point of view of computational geometry, this result is rather intuitive. 
Indeed, no matter the image space, co {g{a, b);a ^ A,b ^ B^^ is a polytope with at most 
AB vertices which belongs to a relative space of finite dimension at most AB — 1. Up 
to a renormalization, this gives Theorem 11.71 However, in case where H = L2{i^, fJ.,J-), 
this does not directly imply previous results as the approachability is only in probability 
and not ^-almost surely. 

1.4.3 Potential-based and uniform-norm approachability 

Approachability was first defined with respect to the £^ distance. Roughly speaking, this 
induce a repeated game (see also the next subsection) between the player and Nature 
where the first player minimizes the distance to the set £ and Nature maximizes it. 
This can be generalized to a more general class of mappings <I> : K,'^ — t- IR, called po- 
tentials, that are twice continuously differentiable (although this condition can be fairly 
weakened) . 

An illustration of the interest of potential based approachability is given in the fol- 
lowing Corollary 11.161 It yields fastest rates of convergence when distances to sets are 
defined with respect to the uniform norm || • ||oo instead of the Euclidian norm ||.||2- 

Let us denote by 5 the minimum level of <I> that player can guarantee in expectation 
if he plays second, i.e. 

6 = inf |a G 1R s.t. Vy G A{B),3x G A{A),^{g{x,y)) < a} and £s ■= ((-oo, 5]) . 

Theorem 1.8 Assume that, for every z outside 8^, the gradient V<^(z) points suffi- 
ciently towards z, i.e., there exists /3 > such 

'iz<^£s, 3x := x{z) G A(^) s.t. {V<^{z), g{x,y) - z) < -/3($(z)-5), Vy G A(^). (12) 
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Then, no matter the strategy of Nature, choosing Xn+i = x{gj^) yields, in expectation, 
^(5n)-^< (^^^ ^//3>1 '^rid ^gJ-5<Kj-^^^^^±^ If0</3<1, 

where k$ is a constant depending uniquely on ^. 

///3 = but the inequality is strict in (fT2|) . then uniform convergence still holds yet 
at a non-explicit rate. 

Proof: First, notice that we can focus on the case where 6 = 0. The proof follows from 
Hart &; Mas-Colell |32| and Sorin |71| (see also Cesa-Bianchi & Lugosi |13| I14| ) and is 
based on a Taylor expansion of Indeed, since 'g^+i =gn^ idn+i — 9n) /(^ + 1) ^ 
is C^, there exists some ^„ G [5n+i)5n] such that 

HVn+l) = ^{9n) + ^-T (V$(5n),9n+1 - 9„) + TT^^-T2(9n+l-9n)'^^^(e«)(9n+l-5n) 

n + 1 2[n + Ij^ 

where V<I> and are respectively the gradient and the Hessian of since the latter 
is and every g^ belongs to the same compact set, there exists such that {gn+i — 
'g^y D'^^{^n){gn+i ~'9n) — for every n S M. As a consequence, one has 



E 



$(5n+l)] < (i-^)e[$(5. 



+ 



(n + l)2 



and the result follows from simple induction when /? > 1. When < /3 < 1, the bound 
is a consequence of the fact that 

^ P_\ log(n) + 1 1 ^ log(n + 1) + 1 



n + lj nP (n + l)2 - {n + lY 

The proof is a bit more intricate for /? = (along with a strict inequality in ()12p ). but 
we can use the fact that ^„ is a D.S.A. of the following differential inclusion 

g G ^$(g) - g with ^#(z) = jo; G E'^; ||a;|| < \\g\\oo and {V^{z),uj - z) > o|. 

The mapping 1 1— >• $(g(t)) is a Lyapounov function since if g(t) 

|$(g(t)) = (Vcl>(g(t)),g(t)) G (V$(g(t)), A$(g(t)) -g(t)) < 0, 
therefore g converges to Q and so does 'g^. ■ 



If C and $ are convex, then Equation (fT2|) always holds with /3 = 1, and we recover 
Theorem 7.6 of Cesa-Bianchi & Lugosi ^Ti] (due to Hart &: Mas-Colell |32)): 

Corollary 1.13 IfC = ^>~i ((-oo, 0]) /or some convex, twice continuously dijjerentiable 
mapping ^> whose Hessian is bounded in norm by k<j> on co {5(0,6)}, t/iere exists a strat- 
egy such that, in expectation and no matter the strategy of Nature, ^{'g^) ^ 2K.^(log(?i)+i) ^ 
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The assumption that $ is twice continuously differentiable can be easily weakened, 
in particular as soon as the constant k$ exists. The next proposition is concerned with 
the sequence of sums of payoffs Gn = Ylm=i 9m instead of averages. It will be used, in 
some cases, to improve rates of convergence. 

Proposition 1.14 Assume that 

Vz £5, 3x := x{z) £ A{A) s.t. {V<^{z),g{x, y)) < 0, Vy G A{B) 

and there exists k$ > such that g{x,y)' D'^^{z)g{x,y) < k$ for every x G A(^), 
y G A(;B) and z ^ £s- Then, no matter Nature's strategy, choosing Xn+i = x{Gn) yields 

Proof: This is £1 consequence of the fact tlia,t, for some ^ [^n? ^n+i]? 

followed by an immediate induction. ■ 



This result can be immediately extended if <I> is not but such that 

HGn+l) - {HGn) + (V$(G„),<7n+l)) < 

As mentioned before, the following corollary shows a faster convergence if C is an ap- 
proachable cone. Proposition 11.141 is even used more deeply to get optimal rates of 
convergence (both in the number of stages and the dimension) below to obtain ap- 
proachability with respect to the uniform norm. 

Corollary 1.15 IfC is an approachable closed and convex cone, then Blackwell's strategy 
ensures that, no matter Nature 's strategy and for every G IN 

E.,.[dc(9„)]< 



n 



Proof: First, if C is a cone then necessarily {z — nc(z),nc(z)) < 0, therefore the first 
condition of Proposition 1 1 . 14l is the characterization of the fact that C is a B-set. Second, 
if $(•) = d^(-) then <^ satisfies the second condition of Proposition 11.14] - or at least its 
straightforward extension - with k$ = H^H^. 

Since C is a cone, d^{^^) = <I>(G„+i)/n^ and the result follows. ■ 

For simplicity, we will assume that ||(7(a, ^jjHoo ^ 1 and we only consider target sets 
such that, for some 6fc,Cfc < 1, 

C := |z G s.t. bk<Zk< Ck,yk G {1,... ,4} . 

The £°°-distance to this set is denoted by d^, i.e. ^^(-z) = infcgc Ik ~ c| 



00 ■ 
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Corollary 1.16 There exists a strategy a of the player such that, against any strategy 
T of Nature and every n £ N and 6 > 0, with probability at least 1 — 5, 



< 14 



log(2d) 



and 



n 



Pa,r{'ic°(5J>'j}<16l 



n 



Proof: We first prove a similar result in the specific case where C = RI is the negative 
orthant and if an horizon is known in advance. Then we will use a doubling trick 
to conclude for the orthant; we will finally show how to reduce approachability of any 
product set C. 

Let $ be the following potential, depending on a parameter ry > to be fixed later: 



^(z) := - log 
ri 



,fc=i 



so that d'c'{z) < $(z) and ^{0) < log{d) ; moreover 



and D'^^{z) = 7?diag 



??V«>(z)V$(z)', 



where diag(Aj) is the matrix whose diagonal is Ai,...,Ad and zero everywhere. As a 
consequence, since C is approachable, the first condition of Proposition 11.141 is satisfied 
and u' D'^(^[z)ijj < r/tj' diag(Aj)a; — r^H V$(z)a;||2 < r7||w||^ implies the second one. 

So, convexity of d^{-), Proposition 11.14] and the choice rj = ■\/\og{d) /N imply that 



lEo-.r 



N) 



N 



log(d) 
N ' 



We now make appeal to the doubling trick, that is, we consider the strategy con- 
sisting in playing by blocks of lengths 2'^, following the potential associated with % := 
Y^log(d)/2'^ on the k-th block and reseting everything at the beginning of a new block. 
A simple induction, based on the convexity of shows that, at the end of any block, 



2^-1; 



< 2(1 + V2 



log(^) 

2*= - r 



Hence it remains to control distances within blocks. Yet, using the previous bound 
obtained for ends of blocks, one has for n = 2^^' — 1 + m with m < 2^, 



1 



< 



n 



log(rf) 
rjkm 



n 



2(1 + \/2)\/2fc - 1 + + V¥ 
V2^ 



< (4 + 2^/2) 



\og{d) 



n 
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Concentration arguments give the bound in high probabiUty. Indeed, the union bound 
imphes 

F[3k<ds.t. |^^-E[5^J| >e} <2ffexp(-y), 



thus the probabihty that \\g^ — E[g'„] ||oo is smaher than y ^ log (^) is bigger than 1 — 5. 
The result for the orthant is a direct consequence of the triangle inequality. 

We no longer assume that C is an orthant, but is defined by 

C := |z G M.'^ s.t. bk<Zk< Cfc,VA; G {!,... . 

Let h(x, y) = {gk{x, y) — Ck, i>k~9k{x, y) ] G IR^'^, then approachability of ^ to C is 

V J ke{l,...,d 

equivalent to the approachability of /i„ to the negative orthant, since d'^'i^^) = d'^2di'9n) 

and g{x, y) G C if and only if h{x, y) G IR^'^. The result follows from the bound exhibited 
for the orthant. 



1.4.4 Prom weak approachability to approachability 

Recall that a closed set £ is approachable if the player has a strategy such that after 
some (maybe large) stage A^, the payoffs remains in a small neighborhood of 8. Similarly, 
it is excludable if Nature can enforce the dual: after some stage iV, the payoffs remains 
outside some neighborhood of 8. Blackwell proved that there exists a dichotomy for 
convex sets: they are either approachable or excludable. This is not true for any set, as 
illustrated in the following example, due to Blackwell. 

Consider the set and payoff matrix defined by, with A = {T, i?} and B = {L, i?}, 

L R 

f={(l/2,y),yG [0,1/4] I u{(l,y),yG [1/4,1]) and T (1,0) (1,1). 

' B (0,0) (0,0) 

Assume that the strategy of the player dictates to play T during stages (with A^ a 
large even number) then to play either always T or always B during the following A^ 
stages, depending on wether Nature has played more than half of the time R during the 
first A'" stages. 

In the former case, the player got after A^ stages, an average payoff of (1,?/) with 
y > 1/2 thus by keeping to play T for A^ stages, he ensures that its average payoff after 
2 A" stages is {l,y') with y' > y/2 > 1/4. In the latter case, the payoff after A" stages is 
(l,y) with y < 1/2, thus the payoff after 2A^ stages is (1/2, y') with y' = y/2 < 1/4. 

As a consequence, this strategy guarantees that, after 2A^ stages, the payoff is exactly 
in if . So if this procedure is applied during 2A^i stages, then started over for 2A'2 = 2e^^ 
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stages, then started again over for 2N'j, = 2e^^ stages and so on, the payoff is infinitely 
often arbitrarily closed to £ which is therefore not excludable. 

Unfortunately, £ is not approachable; indeed, this would imply that at least one of 
the two connected (and convex) component of £ is approachable. But neither of them 
satisfies Blackwell's characterization. 

In this example, the player cannot enforce the payoff to remain close to £, but if he 
knows in advance that there are only stages in the game, then he can ensure that, at 
the terminal stage, the payoff is in £ (or at least, for odd integer, 1/A^-close to £). A 
natural weaker concept of approachability emerges: a set £ C is weakly- approachable 
if, given some fixed large length of the game, the player has a strategy such that the 
terminal average payoff is close to £. 

Definition 1.6 A closed set £ C 11"^ is weakly approachable if for every e > 0, there 
exists G IN such that, in any game of length n> N , the player has a strategy an such 

that, no matter the strategy r of Nature, Eo-„,t <^£(5n) — ^■ 

Similarly, £ is weakly excludable if Nature can weakly approach the complement of 
for some 5 > 0. 

We emphasize the fact that in weak- approachability, strategies can depend on the length 
of the game n, which is not allowed for regular approachability. The question rose 
by Blackwell and solved by Vieille [77] is wether there exists a dichotomy between 
weakly- approachable and weakly-excludable sets. 

Theorem 1.9 Any closed set is either weakly-approachable or weakly- approachable. 

Proof: We only sketch here the proof of Vieille[77|. 

Consider the differential zero-sum game where the player and Nature choose ac- 
tion x{t) G A(^) and y{t) G A(S) in continuous time (actually, even the formal def- 
inition of strategies might require precise notations and concepts). In this game, a 
state variable which represents the accumulated payoff, evolves following the dynamic 
G(t) = g[x{t),y{t)) and G(0) = during the time t = and t = 1. 

In this game, the overall objective of the player is to minimize the terminal payoff 
ds (G(l)), while Nature maximizes it. The important fact is that one can prove, using 
techniques and results from differential games, that this game has a value v. If v = 0, 
then the player has a strategy such that the cumulated payoff at time t = 1 is exactly 
in £ whereas if > 0, Nature has a strategy such that this cumulated payoff is bounded 
away from £. 

It remains to understand that a game in discrete time with stages is a discretization 
(or an approximation) of this differential game and as A^ goes to infinity, this approxi- 
mation is more and more precise. Therefore, if the player can enforce that G(l) belongs 
to £, then he can ensure that is arbitrarily close to £ when A^ is large enough. The 
converse holds for Nature, hence the result. ■ 
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Actually, the focus of this section if not only this important (and elegant) result 
but also the following properties, inspired from Cesa-Bianchi & Lugosi |14| or Rakhlin, 
Sridharan & Tewari |66| . Given an approachable convex set C, let be an optimal 



strategy in the A^-stage zero-sum game with terminal payoff Eo-^- 
hy vn the value of this game (its existence is not difficult). 



dcig 



N) 



and denote 



We know that 



ability strategy, by O 



deign) 



can be upper bounded, using some adequate approach- 

S\\oo/\/n); but it is also obviously lower-bounded by Vn- So the 
computation of Vn could indicate wether the rate is tight or not. On the other hand, 
exact computation of Vn might be challenging, yet if satisfies 



min max Eg- . 



dc{9n) 



max min E^- - 



dc{9n) 



< maxminEo-T 

T a(r) 



dcigj 



where 0"(r) is the strategy that chooses, given r and after the finite history h", Xn+i = 
j;(r(/i")). In particular, E.^ ,^(.^) := Ylm=i ^r.o-(r) [Sml^™"^]/'^ belongs to C and there- 
fore (removing the dependency in a) 



Vn < maxE 



\9n-'^T[9n\ 



< supE 

9 



El=l9m-n9m\h"'-'] 



n 



< o 



where the supremum is taken over all sequences grn—^[9m\h"^~^] of martingale differences 
with S {5(0, b),a G A,b £ B}. Last inequality is a consequence of Hoeffding-Azuma's 
inequality in Euclidian spaces. 

A question that naturally arises is wether we can concatenate - using the doubling 
trick - optimal strategies in games of length 2^ to construct an approachability strategy 
of C (i.e., independent of any horizon n). The answer is both no and yes: no with the 
current definition of f„. Indeed the only guarantee is that terminal payoff is f2fe -close to 
C but, for instance, payoff at middle stages could be arbitrarily away. 

On the other hand, since C is a convex set, we can modify the definition of Vn as 
follows so that the answer is yes. Define 



771 

sup —dc{g„ 

m<n 



< maxminEo-r 

T a(r) 



TTi 

sup —dc{g„ 



= max min Eo- ^ 

SO that, using the same arguments and Doobs (or Hoeffding) maximal inequality 



v' < maxE 



sup — \\9n 
m<n ^ 



^r[gn] 



< supE 

9 



m 

E 

s=l 



gs-E[gs\h' 



< O 



\g\\ 



Finally, the doubling trick works with this definition of v'^^, see the proof of Corollary [TTTUJ 

This technique seems void at first sight, but might be useful in some specific examples, 
as in Proposition 14.21 in Section H] (see also Remark 14. 2p . In this case, because of the 
geometry of C, one has dc{z) < 2||z — c||oo, for every c £ C. Then the same tools yield that 
Vn is smaller than O (^y^log{d) / 7i^ which is negligible compared to ||g||oo/\/^ — \fdjn 
as the dimension d increases. 
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2 Regret minimization 



Hannan |30) introduced the concept of external regret in repeated two-player games (be- 
tween a player and Nature, with scalar payoff) in order to define an exogenous criterium 
to evaluate a strategy in a non-Bayesian framework. In words, the player has no ex- 
ternal regret (or his strategy is externally consistent) if, asymptotically, he could not 
have gained strictly more if he had known, before the beginning of the game, the em- 
pirical distribution of moves of Nature. This notion has notably been refined by Foster 
&: Vohra |23) (see also Fudenberg & Levine |28| ) into internal regret: a player has no 
internal regret (or his strategy is internally consistent) if he has no external regret on 
the set of stages where he played a specific action, as soon as this set is big enough. 

2.1 Finite action spaces 

Consider a two-person repeated game where action spaces of the player and Nature are 
A and B (of cardinalities A and B) and p : A x B ^ [0, 1] is a real payoff mapping. 
Extension of p to A(^) x A{B) and strategies are defined as in the previous section. 

2.1.1 External regret 

Choices of actions a„ G ^ and bn ^ B generate a regret G M,^ defined by 

rn = r{an,bn) := (^p{l,bn) - p{an,bn), ■ ■ .,p{A,bn) - /o(a„,,6„)^ G R^. 

Intuitively, the regret represents the differences between what the player could have 
got and what he actually got. And a player has no external regret if, asymptotically, 
every component of the average regret is non positive. In words, this means that the 
player could not think " if I had known [the empirical distribution of Nature's actions], 
I would have always played action a* ", hence the terminology of regret. Indeed, by 
linearity of p, 

rn = (/3(l,&n) - Pn, ■ ■ ■ , P{AK) - Pn) ^ ■ 

Given a vector U G Mf^, the notation will stand for the positive part of U, i.e., 
= (max{C/*, 0}) -j^^^^^. Similarly, is the negative part of U. 

Definition 2.1 A strategy a of the player has no external regret if, for all strategy r of 
Nature, P^^r-O'lrnost surely, 

limsup max p(a*, 6„) — p„ < 0, or equivalently, limsup ||r^||oo < 0. (13) 

The existence of externally consistent strategies goes back to Hannan |30| . However, the 
following theorem, with rates of convergence independent of Nature's strategy, is due to 
Cesa-Bianchi & Lugosi |14| . 
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Theorem 2.1 There exists an externally consistent strategy a, such that, no matter the 
strategy r of Nature and for every n G IN, 



maxp(a*, &„) - p„ 

a*eA 



< 2 



log(^) 



n 



We will not yet provide proofs of this result; instead, we will show a weaker result, 
following Zinkevich |83| . The basic idea is to notice that the overall objective is to 
maximize the convex function p{-,bn) and therefore to apply any convex-maximization 
techniques, for example a gradient descent. 

Proof: First, we claim that for every n G IN, there exists a strategy (T„ (that depends 
on n), such that 



maxp(a*,6„) - p„ 



n 



Let r/ be a parameter to be fixed later and define, for every m < n the strategy a 
following an usual gradient descend: 



C+i = + T]p{-,bm) and Xm+i = ^A{A) {x'm+i) ' with Xi 



I 



1 1 



the projection step ensures that Xm+i stays in A(^). Simple calculations show that, for 
every a G ^, 



^ p{a, h 



m ) Hm 



m=l 



- XmYpi-^bm) 



m=l 



n 



1 

2rj ^ 



m=l 
n 



<-E 

- 277 ^ 

' m=l 



1*^ Xm\\ ~l~ \\Xm ^ui+l 



I _ ' l|2 I II 

I ^ "^TTi II ~l~ 1 1 



771=1 

l|2 



a — X. 



777+1 I 



\a — X. 



777+1 I 



^||a-.i| 



1 



777=1 



Balancing the two terms by choosing rj = Xj \JnA proves the claim. We stress out the 
fact that this strategy ensures that, at stage t, the regret is bounded as 



IE(7,r 



P71 



.777=1 



\/nA ty/A 
- 2 ^ V^' 



which might be considerably bigger than VtA for small t, but this uniform guarantee 
allows the use of a doubling trick, as in Corollary II. 16^ to conclude. ■ 
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To get the log(A) term instead of A in the upper bound, one just has to follow the 
algorithm known as exponential weight algorithm, defined by: 

Xn+i[a\ = — , , T .^ where r/„ = y/8nlog{A) , 

Ea'e^exp {rjnpia',bn)) 

see, e.g., Littlestone &: Warmuth |47| . Vovk |79) or Auer, Cesa-Bianchi & Gentile [3]. 

The following corollary shows that the previous result can be extended to the compact 
case. Actually, the proof is exactly the same, since it did not use the fact that B is finite, 
thus is omitted. 

Corollary 2.1 Assume that Nature chooses at every stage an outcome vector Un in a 
compact set U C [0; 1]"^ such that the players payoff at this stage is f7°". Then there 
exists a strategy a, such that, no matter the strategy r of Nature and for every n G M, 



E, 



fT.r 



maxC/^ -K- 



< 2 



log(^) 



n 



Theorem 12.11 and Corollary I2.1l can actually be proved using more complex optimiza- 
tion procedures, as mirror descent instead of gradient descent (see e.g., Rakhlin [64] or 
Bubeck |12| for a survey on the use of these techniques in machine learning) and without 
using doubling trick. We will, on the contrary, prove them using approachability theory. 



Remark 2.1 In Section\1.4-4\ we claimed that we could not use a doubling trick. It was 



possible here because any strategy an, although only optimal at the final stage n, ensures 
relatively good performance at all stages. For instance, at the specific stage t = n/2, 
the regret is bounded in ^^^J A/t ~ 1.06^ A/t. This was not the case in the previous 
section, where the distance to the set could be of the order of a constant. 

More specifically, strategies an are somehow equivalent to weak approachability (only 
the final stage matters). If we could always concatenate strategies using a doubling trick to 
output a strategy that behaves well at all stages, then we could construct approachability 
strategy from weak approachability strategies. This would mean that any set is either 
approachable or excludable, which is not true in general (in fact, as proved in Section \4.1\ 
regret corresponds more to the approachability of convex sets, on which weak and regular 
approachability coincide). 

An usual criticism to the notion of regret in games (and this could lead to long 
and probably unfruitful debates) is that a player compares his payoff with the payoff he 
would have got if he had always played the pure action a*. However, if he had played 
something else, then Nature would (or at least could) have chosen a totally different 
sequence bn so the comparison is meaningless. An easy and unsatisfactory answer is to 
say that a player's action does not change the behavior of Nature (as in the learning with 
experts advices literature, see Cesa-Bianchi & Lugosi |14j). A less unsatisfactory answer 
consists in stating that since there is absolutely no prior on Nature, it is impossible to 
infer whatsoever on her strategy if the world had been different. So we should compare 
the payoff with respect to best information available, which is the current sequence. 
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Let us develop a third point of view, based on game theoretic perspectives. The basic 
idea is that regret is not a criterion to compare different strategies: it does not say that 
a strategy without regret is a better strategy than always playing a*. In our repeated 
game, the player maximizes his cumulated payoff without any structural assumption on 
Nature. Therefore, he can just sequentially formulate predictions upon her behavior (we 
purposely remain vague on this subject) and play a best response to it. Regret is a 
simple measure on how much a sequence of predictions is correct or not. A large regret 
would mean that the player was most of the time wrong. 



2.1.2 Internal and ^-regret 

The notion of external regret has been refined by Foster & Vohra |23| into the so-called 
internal regret. In words, a player has no internal regret (or his strategy is internally 
consistent) if he has no external regret on the set of stages on which he chose a specific 
given action. 

Formally, choices of action Un & A and bn & S generate, besides an external regret 
r„, an internal regret i?„ which is an ^4 x A- matrix whose raw are null except the a,i-th 
one which is r^; stated otherwise 



p{a', bn) - p{an, bn) iia = an 
otherwise 



Let us introduce here some notations. Given two sequences G IR'^ and a„ G A, recall 
that Ij^ denote the average up to stage n. We define, for every a G .4, the following 
subset of stages and conditional averages 



]N„[a] := |m G {1, . . . , n} s.t. Om = a| and 5„[a] 



Definition 2.2 A strategy a is internally consistent if, no matter the strategy r of Na- 
ture, P^^T -almost surely, 

limsup ||i?^||oo < or equivalently limsupmaxJ — !iOi f uiax. p(a* ,bnla\) — PnM ) < 0. 

It is compulsory to multiply the regret accumulated on ]N„,[a] by the frequency of action a, 
namely |IN„[a]|/n. Otherwise internally consistent strategies would not exist. However, 
another possible formulation (see Lehre & Solan |45| ) is to require that 

limsup maxp(o*, 5„[a]) — p„[a] < 0, for every action a G ^ s.t. lim |IN.„[a]| = oo, 

but, unfortunately, this definition does not allow to measure internal regret at a given 
finite stage n. 

Existence part of the following theorem is first due to Foster Sz Vohra |23| : rates of 
convergence (constant are not optimal, see e.g. Stoltz & Lugosi |75| ) can be inferred 
from rates of external regret, as showed in Section [2.1.31 where proof is postponed. 
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Theorem 2.2 There exist internally consistent strategies such that, for every n G IN 

\^n[a]\ ( / * T r A - r / 

I maxp(a ,6„[ajj - p^[a\ 



max ■ 

















OO - 



< 3 



\og{A) 



n 



Regret has been refined further by Blum & Mansour into swap-regret (or 
regret). Define, for every mapping (j) : A ^ A, family ^ C {(j) : A ^ A} and n G IN, 



1 " 



m=l 



. bm) - p{am, bm) and = (rK) G I^I*' • 



Definition 2.3 A strategy a has no ^-regret if, no matter the strategy t of Nature, 
'^(jr-almost surely. 



lim sup 

n— ^-oo 



< or equivalently lim sup max iPi^i rn), m) _ ^ ^ 

n-5>oo </'6$ n 



Existence of such strategies is due to Blum & Mansour |TT]; proofs are again delayed. 

Theorem 2.3 There exists strategies without ^-regret such that, for every n € IN 

R^l 



< 3 



n 



The notion of <l>-regret is a refinement of respectively external and internal regret, 
because of the specific choices of families <^e := |0a*;Va* G A, (pa* (a) = a*,Va G ^| or 

$j := l^a'.a*; ya',a* E A,(f)a',a*iO'') = CL* and (j)a',a*{o) = a \i a ^ a'|. Proposition 12.21 
links the different aforementioned quantities, and shows that minimizing internal regret 
is, in some sense, enough to minimize each one of them (up to the cost of a factor A). 
We will need the following notation. 

Given a family ^ d {(j) : A ^ A\, we define the matrix of size |<I>| x by 



) _ •£ (j)(^Q^ = qI Q otherwise. 



Proposition 2.2 Given any family <Z {(f) : A ^ A], one has R^n = H^Rn where Rn 
is seen as a vector of size A"^ . As a consequence, R^^ — ^ 

OO OO 

For the specific case of external regret, one also has r„ = (where Rn is seen 

as a matrix and 1 is a vector with only ones). The converse is not true as there exist 
externally consistent strategies with linear internal regret. 

Proof: The proof of the first part follows directly from the definitions of internal and 
$-regret. For the existence of externally consistent strategies with linear internal regret, 
we refer to Stoltz & Lugosi [75]. ■ 
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Another refinements of these concepts can be made, following this time Fudenberg 
& Levine |28| and Lehrer |43| . in two different directions. The first one is to assume 
that regret is computed not at every stages, but only on a restricted subset of stages 
(that might depend on the history) and the second direction is to consider time varying 
switch- mapping (p. Formally, let X be an activation function, i.e., X : H x A ^ {0, 1} 
and Af (/i", a„,_|_i) = 1 indicated that the stage n + 1 is active. We recall that H stands 
for the set of all finite histories.. A switch function (p : H x A ^ A indicates that, after 
the finite history /i", /j(a„+i, will be compared to p((f){h",an+i),bn+i)- 

Definition 2.4 Given an activation mapping X and a switch mapping (j), a strategy a 
has no {X , (p) -regret if, no matter the strategy r of Nature, V „^T-almost surely 

limsup TT-- ^ ^ < 0, 



as soon 



as J2m=i'^(.^"^ ^,o.m) converges to +oo. 



Lehrer |43) has proved that, given a probability A on the whole set of pairs of activations- 
switch mappings (embedded with the product topology), there exists a strategy without 
(X ,(p)-Yegret, for A-almost all pairs. However, rates of convergence are not explicit, in 
part because we divide the score by the number of actives stages Ylm=i ' 
and not by n. 

2.1.3 Reductions : form external to <I>-regret 

In this section, we show how to construct a strategy with no $-regret based on an 
algorithm that only outputs externally consistent strategies, developing an idea of Stoltz 
& Lugosi |75| and recovering the more general result of Blum & Mansour Indeed, 
consider the following auxiliary game where action sets of the player and Nature are 
respectively $ and a compact subset U C [0; 1]"^. Given an exogenous sequences pn G 
A(^) we define the payoff at stage n of the player generated by the choices of <^ G $ 
and Un hy 

Let 6 be an externally consistent strategy and Ofi denote the weight put hy on (j) at 
stage n. Then the expected external regret at this stage is written as 

ri = Vf - E [y^] = 2^Pn[a]C/f (-^^ - Yl E^-N^-^"^ 

aeA </>6'I> aeA 

a&A a&A \</>e$ 
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On the other hand, the strategy that dictates to play p„ at stage n in the original 
jame suffers an expected <I>-regret defined by 



E 



So, as soon as Pn[a] = ^^^^GnPn ° <A~"^[q^] for every a ^ A, ^-regret in the original 
game and external regret in the auxiliary game coincide exactly (in expectation). And 
the latter converges to zero, at the same speed of the former, i.e., at rates indicated by 
Theorem 12.11 and Corollarv 12.11 

The existence of such a p„ is a simple consequence of Brouwer fixed point theorem. 
Indeed, first, notice that 9^ depends only on the past observations, thus is independent 
of Pn- As a consequence, Pn can be taken as any fixed point of the continuous mapping 
p I— >■ X]</(6<i> ^nP ° 4'~^ from the simplex A(^) to itself. 

We only have proved the convergence of <J?-regret in expectation; as usual, almost sure 
convergence is a consequence of concentration inequalities (or see Theorem 2.7, page 47 
and Example 1, page 19, in Hall &: Heyde |29)). 



2.2 Compact action spaces, generalizations and examples 
2.2.1 Compact action spaces 

Although <I>-regret can be seen as a consequence of external or internal regret in the finite 
case (when A is finite), its introduction is more useful in the following compact case. 

Assume that A, action space of the player, is no longer finite but a compact subset 
of some Euclidian space. On the other side, U, action space of Nature, is a subset of 
mappings from A to R. Choices of a„ S ^ and Un generate, at stage n, a payoff of 

Pn ■= Un{an)- 

External regret is defined almost exactly as before, i.e., r„ : .A — t- R is a continuous 
mapping defined by r„(a) = [/„(«)—/)„. In the compact case, we must however be careful 
in the order of quantifiers when passing to limits: a strategy a is externally consistent 
if, for all strategy r of Nature, Po-^t— almost surely, 

sup limsup C/n(a*) — Pn ^ 0; equivalently. 

Remark 2.2 We claimed that order of quantifiers has some importance. Assume that 
A = [0, 1] and that for every n G IN and a € [0; 1], Un{o) = lae(o,i/n)- Choosing always 
the same fixed action a* gives zero as an asymptotic average payoff, therefore the strategy 
that plays = should not have any regret (neither external, internal, or $ for that 
matter). 

On the other hand, for every G M, the choice of a* = 1/2N gives U]\f{a*) = 1, 
thus limsup^^oo sup(j*g_4 C/n(a*) — p„, = 1- This explains the choices in the order of 
quantifiers in the definition. 



lim sup 



< 0. 
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Difficulties arise to define internal regret, because scores are multiplied by frequencies 
of actions in the finite case. We shall instead only focus on <I>-regret, whose definition is 
also identical: : $ — )■ K, is a mapping defined by i?*((/>) = Un{<p{an)) — Un{a.n)- And 
a strategy a has no <l>-regret if, for all strategy r of Nature, Po- almost surely. 



1 " 

suplimsup — Un{(t){0'n)) — Un{an) < 0, or equivalently, 

06* n-S'OO n 



lim sup i?*^ 

n— >oo 



< 0. 

oo 



If A is not compact but (^, T ^ [i) is a probability space, then external and <I>-regret 
can also be defined /x-almost surely. The supremum over $ is simply replaced by for 
p- almost every mappings (/> G <I>. 

2.2.2 Generalizations 

The whole concept of regret minimization can be extended beyond the comparison of 
averages of scalar payoffs. Let g : B ^ IR"^ be a vector valued payoff mapping and 
define a sequence of evaluation mapping Bn ■ (R*^) — ?• R and a class H of departure 
sequence ^[g] : A x B ^ M!^. Then a strategy a has no generalized regret if 

suplimsupB„(^[5r](ai,6i), . . . ,S,[g]{an,bn)) - Bn(g{ai,bi), . . . ,5(a„,6„)) < . 

almost surely, no matter the strategy of Nature. 

Of course, without additional assumptions on the sequences Bn and generalized 
regret cannot be minimized. Rakhlin, Sridharan & Tewari |66| have used the min-max 
techniques to infer the existence of such strategies (associated with rates of convergences) 
on specific cases: 

i) External, internal and $-regret are obtained if g = p, Bn{zi, . . . , Zn) = ^ X^m=i 
and, for every (p £ ^, there exists ^ G H such that ^[5] (a, b) = p{<j){a), b). 

ii) Approachability of a convex C if Bn{zi, . . . , Zn) = —dc Ylm=i -^"i) 
parture mappings are ^[<^](a, b) £ C for any a £ A and b £ B. 

iii) When i? is a function of the average, i.e., Bn{zi, . . . , Zn) = Ylrn=i ■^rn)^ an in- 
teresting (yet maybe counterintuitive) property arises even in the finite case. There 
might exist strategies that are not externally consistent yet internally consistent, 
in the sense that, 



limsup max G[ p{x* ,bn)] — G[ p,^] >0 



but for every a £ A 



limsupM^/^ sup G(p{x*,bn) -G(pja])] <0 
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2.2.3 Experts 



An interpretation - which is actually also a generalization - of these results concerns 
games of predictions with expert advices, studied (almost exhaustively) by Cesa-Bianchi 
& Lugosi |14j . At each stage n G INT, an agent must take a decision dn in some topological 
convex and compact set P. He is advised by a pool £ of experts, i.e., expert e suggests to 
choose the decision at this stage. Once his choice his made. Nature reveals the state of 
the world G 5 (where S is some arbitrary space) which generate a loss L„ := L{dn, s„). 

After n stages, the agent has suffered an average loss of L„ = ^Ylm=i -^i^m, Sm) 
while the best expert had incurred an average loss of = ^ miuegf "^^=1 ^i^rm ^m)- 
An evaluation criteria of a strategy of an agent compare these two quantities, as was 
done by Auer, Cesa-Bianchi & Gentile [3]. 

Corollary 2.3 If L is convex and has value in [0; 1], then there exists an algorithm such 
that 



Proof: Consider an externally consistent strategy a given by Theorem 12. 1[ where the 
action set is the set of experts and the payoff at stage n of choosing expert e is p{e, s„) = 
— L(d^,s„). Denote by Xn+i G A{£) the mixed action dictates by a at stage n + 1. It 
induces the decision d^.+i = X^ee£^ ■^"+1 t^l^n+i ^hich satisfies, by convexity of L: 



2.3 Links with game theory 
2.3.1 Regret and sets of equilibria 

Existence of consistent strategies can be used to prove classical game theory results: non- 
emptiness of Hannan (or correlated) sets and min-max theorems, as noticed by Blum & 
Mansour and Cesa-Bianchi & Lugosi |14) . 

Consider a game between a set of players I of size I, where Ai denotes the finite 
action space of player i and pi : HieX -^i ^ ^ his payoff function (extended multi-linearly 
as usual). Hannan set of player i is the subset of joint distributions of actions defined by 




) < ^Xn+l[e]L{d^n^i,Sn+l) = ^a,s„+^ [p{e,Sn+l)] ■ 



Therefore — is smaller than the expected regret of a, hence the result. 
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where pi{q) = ^q[pi] and is the marginal of q on n^yi-^i' i-^-' empirical joint 
distribution of actions played by the opponents of player i. Informally, a joint distribution 
q belongs to Tii if player i has no interest to always play a fixed action a* G Ai if his 
opponents coordinate to play accordingly to 

By linearity of pi, if a strategy of player i is externally consistent (independently 
of the behavior of its opponents), then necessarily the empirical joint distribution of 
actions converges to Tii- We qualify this property as unilateral, as it does not make any 
assumption on opponents' strategies. 

If every player follows unilaterally an externally consistent strategy (but not necessary 
output by the same algorithm), then empirical distributions of actions will converge to 
the Hannan set of the game, H = Di^x'Hi, which is therefore guaranteed to be non empty. 

The main difference between elements of Hannan set and Nash equilibria is that in 
the latter the distribution must be a product distribution. So set of Nash equilibria is 
always contained, but might be in some arbitrary game, much smaller than Ti. 

On the other hand, in zero sum game, elements of Hannan set satisfy the following 
property. If g S A(^ x B) belongs to Ti, then if we denote by qi S A(^) and q2 £ ^(B) 
its marginals, necessarily 

min max p(x,y) < maxp(a, 52) < piq) < min/9(gi,&) < max min p(x,y) . 

Since maXa.gA(yl) minj^gA(B) p(a;, y) < minj^gA(B) max^gA{^) /o(2:, y) always holds, both 
quantities must coincide and, by definition, are equal to the value v of the game. More 
importantly, the first and last inequality above imply that 

maxp(o, 0^2) = niax p(x,q2) = v and mmp(qi,b) = min p(qi,y) = v, 
a&A xeA(A) beB y&A{B) 

thus {qi, ^2) is a pair of optimal mixed actions. 

As a consequence, in a zero sum game, if players follows unilaterally consistent strate- 
gies, they will obtain asymptotically at least the value. And if both players have con- 
sistent strategies, their empirical mixed action converges to their set of optimal mixed 
actions. 

This property has been somehow generalized by Hart and Mas-Colell |33| in potential 
games, see also Viossat & Zapechelnyuk |78| . They have constructed a specific externally 
consistent strategy such that, if both players follows it, the product of empirical distri- 
butions of actions converges to the set of Nash equilibria (and more precisely to a subset 
of it whose payoff are identical). However, this is only a global property (as opposed to 
unilateral properties) as both players must follow this specific strategy. Moreover, the 
result does not extend to any game, even those with an unique Nash equilibria. 

We proved, following Cesa-Bianchi & Lugosi [14] and Sorin |70| . a min-max theorem 
due to von Neumann using externally consistent strategies. It is actually possible to get 
the following generalized version of Fan |18) . We first recall that a mapping p on Ax B 
is said to be concave-like if for every a, a' € A and a G [0, 1], there exists a* G A such 
that p{a* , •) > ap{a, •) + (!— a)p{a' , ■). Convexity-like is defined similarly. 
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Theorem 2.4 Let A be a compact set, B any set and p a concave-like convex-like map- 
ping on AxB bounded from below and such that g{-,b) is upper- semicontinuous for every 
b & B. Then the zero-sum game on A and B has a value. 

Proof: Let B' be any finite subset of B and consider an externally consistent strategy 
of the first player; its existence is ensures by the following Corollary 14.41 It also implies 
that, for every e > 0, there exists a sequence 6n > going to zero such that, at stage n, 

1 " 

inf max p(a, b) < max p(a, b*) < max — N p(a, bm) < /^n + ^ + '^n , 

m=l 

where 6* is given by the definition of convexity-like applied to ^^1^=1 ^m/n. On Nature's 
side, we can assume that her strategy is such that, at stage n, bn is an action realizing 
infftgg' p{an,b) up to 1/2". As a consequence. 



1 



Pn<-1^ fgPi^^m, b) + ^ < mm i P^^rn, 6) + i < minp(a;, b) + ]- 

m=l m=l 

where a* is given by the definition of concavity-like. As a consequence, taking n and e 
to their limits yields that, for any finite subset B', 

inf maxpfa, 6) < sup inf p(a,b). 

Since A is compact and p{-,b) is upper-semicontinuous, for every e and B' the set 

Ae[B'] = ia G ^ s.t p{a,b) > inf maxp(a,6) -e,V6 G S'i 
[ beB aeA J 

is a compact non-empty set, and this remains true for any finite intersection over different 
subsets. As a consequence, the whole intersection (over every e and B') remains compact 
and non-empty, and any point a in it must satisfy that p{a,b) > inf;,gg maxag_4 p(a, 6), 
for every b £ B. Stated otherwise, 



inf max p(a, b) < max inf p(a, b) 
beB aeA aeA beB 



and the game has a value. 



Stronger results can be proved using internally consistent strategies. Aumann [1] 
defined correlated equilibria in a game as a distribution on the set of profiles of action 
Q ^ ^ {YlieX-^i) such that, for every player i G X and every action a G Af. 

Pi{a,q-i[a]) > iiiax pi{a* ,q-i[a]) , or qi[a]( max pi{a* ,q-.i[a]) - pi{a,q-i[a])] < 

where g_i[a] G A ^n^yi-^i) probability induced by q knowing that Oi = a and 

qi[a] is the probability put on a £ Ai hy q (or the relative frequency of action a G Ai). 
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In words, assume that a referee draws a lottery accordingly to q and only tells player i 
an action he should play. Then, a correlated equilibrium is a joint distribution such that 
every player, when he is told to play action a (and assuming that the others follows their 
recommendation), cannot gain strictly more by playing a* instead of a. 

It is quite clear (from their very definition) that if every player follows unilaterally 
an internally consistent strategy then the empirical distribution of actions converges to 
the set of correlated equilibria (but maybe not to one specific correlated equilibrium), 
see Foster & Vohra |23) . 

2.3.2 Regret, (smooth) fictitious play and follow the perturbed leader 

Fictitious play is a classic unilateral discrete time dynamic in game theory. At stage n, 
each player computes empirical (either joint or product) distributions of actions of his 
opponents and plays a best response to it. Although quite natural, this strategy is not 
externally consistent. On the contrary, Fudenberg & Levine |28| have introduced a slight 
modification, called smooth fictitious play that has asymptotically a regret smaller than 
e (where e > is fixed), see also Hofbauer, Sorin & Viossat |36| . 

Let pe denotes an e-perturbation of p (induced by if) : A(^) — )• R) defined by 

Pe{x, y) = p{x, y) + eil:{x), \/y G A{B). 

Since we are interested in unilateral procedure, we might as well make a change of variable 
by defining U = (p{a,y)) S [0; 1]"^ so that p{x,U) = {x,U). As a consequence, the 
mapping can be rewritten as 

Pe{x,U) = {x,U) +sip{x). 

We also define the e-best response mapping by BI{s{U) = &igmsiX^^^(^^y^-^{x,U) +ip{x). 

We assume that the mapping ip : A(^) — )• E, is chosen so that 

i) is a, continuously differentiable mapping and ||V'||oo ^ 1; 

ii) The e-best response mapping BR^ is univoque and continuous; 

iii) BlisiU) does not belong to the boundary of A(^). 

Actually, point iii) ensures that p^ attains its maximum at a point where its first deriva- 
tive vanishes. It can therefore be weaken into one of the following 

iii') for every U € [0; 1]"^, Dipi;{-, U) is equal to zero at x = BR£(^7) 
iii") Di/9e(-, U) is orthogonal to the gradient of BR^ at U . 

Study of a[h^) = BR£(C/„), the strategy associated with this perturbation, might be 
simpler in continuous time. First, we introduce the mapping : [0; 1]^ ^ E defined by 

We{U) = sup Pe{x,U) = (BR,(C/),[/) +eV(BR.(C/)) . 
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In particular, because of point i), regret is asymptotically smaller than 2e as soon as 
limsup„_j.oo We^Un) — /0„ < £• As in Section ri.4.1[ the continuous-time dynamic associ- 
ated with the discrete-time dynamic of (Um'Pn) 

U,p) G {{V,{BR,{V),V) ; V e [0; 1]^] - {\J , p 

Define A(t) = We(U(t)) - p{t) then one has A + A < e thus X{t) < e + Me"* for some 
constant M. As a consequence, A is a Lyapounov function with respect to the set 

{(^7,p) gR^ xR; We{U)-p<£^ 

which is thus a global attractor of the dynamic (see Benai'm, Hofbauer &: Sorin [?])• So 
{Un,'Pn) converges almost surely to it and the strategy is e-externally consistent. Benai'm 
& Faure [B] proved recently that external consistency can be achieve (without requiring a 
doubling trick argument) with a smooth fictitious play with a vanishing step size; indeed, 
they showed that if e is not fixed but depends on n G IN with 7 < 1, then 

asymptotically the regret converges to zero. 

Smooth fictitious play (also known as follow the regularized leader) is a generalization 
of two classes of algorithms, exponential weight algorithms or its even more general 
version called follow the perturbed leader (see Cesa-Bianchi & Lugosi |14j . Sections 4.2 
and 4.3). To recover the first class of algorithms, entropy must be used as regularization, 
i.e., V'(x) = - Eae^^Wlog(^N) 

BRi/^(;7)[aJ - 



Ea'e^exp(r/ U") 

which is, by definition, the exponential weight algorithm. 

Links with follow the perturbed leader (or Stochastic Fictitious Play accordingly to 
Fudenberg & Kreps [26]) might be a bit more tedious. This algorithm does not choose a 
deterministic regularization eip but perturbs each component of Un by a random quantity 
ej^, such that the joint density / : R"^ — )• R of the vector (en)ag_4 independent of C/„ 
and n. Action played at stage n -|- 1 is any maximizer of C/^ + e° . In particular, a given 
action a is chosen at this stage with probability X°'{Un) where X°(-) is defined by 

X'^iU) = P{ argmax„,g^ U"' + e"' = a] . 

Follow the Perturbed Leader generates a discrete stochastic process {Un,'Pn) which is an 
A.S.D. of the following differential inclusion 

(U,p) G{(y,(X(U),F)) ; y G [0;!]^}- (U,p 

This is a special case of Smooth Fictitious Play since, as soon as / is positive and X 
is continuously differentiable, Hofbauer &: Sandholm |35) have shown that there exists 
a deterministic regularization £-0 such that X(U) = Uli^iU). For example, in the case 
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where are i.i.d. with cumulative distribution F{x) = exp ( — exp(— ?7X — 7)) (where 
7 the Euler constant), follow the perturbed leader coincides exactly with exponential 
weight algorithm (see e.g., Lemma 1 in McFadden |54|). 

As mentioned before, proofs based on A.S.D. do not exhibit rates of convergences 
(and this might be seen a major drawback of these techniques). However, we only 
considered here strategies that do not depend on the past sequence of player's actions 
(but only on the sequence of Nature's choices). So the discrete process is very closed 
to the one induced by procedures in law (this is not the case for approachability, see 
Section 11.3. ip which is in turn close to the continuous-time process. And it is actually 
possible to quantify explicitly these relative differences, see e.g., Sorin [72| or Kwon [40) . 
in order to recover exact rates of convergence. 



3 Calibration 

We recall that calibration is a criterion introduced by Dawid |16| in the following repeated 
games between a player and Nature. At each stage n G M, Nature chooses a state of the 
world ujn in some finite set Q and the player makes a prediction upon its law by choosing 
a probability distribution pn G A(il). Strategies of the player and Nature are mappings 
from the set of finite histories U.„g]N(il x A(f])"' into, respectively, A(A(r2)) and A(f]). 

The usual example consisting of a meteorologist that predicts each day the probability 
of rain corresponds to O = {0, 1}, with w = 1 if it rains. This binary case is in fact much 
easier than the general CcLSG, cLS discussed in Section 13.1.21 



3.1 Finite {e and grid) calibration 

We will need the following notations. For every p G A(0) - seen as a subset of R^~^ - 
and e > 0, let ]N„[p, e] be the set of stages where the prediction was e-close to p, i.e., 

]N„[p,e] = |m G {1, . . . ,n} s.t. \\p„^ - p\\ < e|, 

where || • || is an Euclidian norm of IR^~^. We denote by c<7„[p, e] G A(Q) the empirical 
distribution of states on IN„[p, e] and by e] the average prediction on it. 

Definition 3.1 strategy a of the player is e-calibrated if for every strategy r of Nature, 
and for every p G A(il), 



\^n[p,e]\ 

iim sup 



n 



e] <0, P^^-as. 



A strategy is calibrated if it is e-calibrated, for every e > 0. 

Intuitively, a strategy is e-calibrated if on the set of stages (assuming that it is big 
enough) where the prediction was e-close to some p G A (17), the empirical distribution 
of states is close to this specific p. Although not stated explicitly in Definition 13. H it 
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is possible to require that rates of convergence are independent of Nature's strategy see 
Section [4.21 below. With a careful concatenation of e-calibrated strategies, following the 
doubling trick, one can easily obtain a calibrated strategies, as did Foster & Vohra |24j 
or Fudenberg & Levine |27| . It remains to construct such strategies, which can be done 
using the slightly weaker concept of calibrated strategies with respect to an e-grid of 
A(ri) defined below. 

We recall that a finite subset ; £ € £} of /C C M!^ is an e-grid of IC if for every 

X € IC, there exists i £ C such that — x[i]\\ < e. Moreover, such a grid is regular if 
there exists {ei, . . . , e^}, d linearly independent vectors, such that 



x[£] ; £ G £| = l^nfcCfc ; G z| 



n/c 



Assume that the player can only make predictions on a grid {p[£\ ; i G of A(J7), so 
that a strategy is a mapping from the finite histories into A(£). Empirical distribution 
of states on ]N„(^) := |m G {1, . . . , n s.t. pm = p[f\} is denoted by 

Definition 3.2 A strategy a of the player is calibrated with respect to {p[£] ; £ G £} if 
for every strategy r of Nature, for every i £ C, 

limsup f IbnW -pWW -mm\\uJnW - p[k]\\] < 0, F^^-as. 

In words, a strategy is calibrated with respect to a grid if on the set of stages where p[i] 
is predicted, the empirical distributions of states is closer to p[i] than to any other p[k]. 

Remark 3.1 Given a finite grid, the Voronoi cell associated with p[i] is the set of points 
closer to p[i] than to any other p[k], i.e., 

V[£] := ip G A(il) s.t. \\p-p[l]\\ < min||p-p[A;]|||. 
I " " kec " "J 

Each Voronoi' is a polytope since they are defined by a finite number of linear inequalities, 
their union covers A(0) and any intersection has empty interior. The fact that the 
calibration score 

is non positive means that tJ^M belongs to (or converges to) the Voronoi cell V[P\. 



|a;„,[£] — — min ||a;„,[£] — 



Dawid |17] and Oakes |57) proved that there does not exist deterministic e-calibrated 
strategies, based on a counter example given in the following section. On the other 
hand, there exists random e-calibrated strategies, as proved by Foster and Vohra |23| by 
exhibiting an algorithm to construct makes the so-called Brier score decrease to zero. 

Tiieorem 3.1 For every grid, there exists a calibrated strategy with respect to it. As 
a consequence, for every e > 0, there exist e-calibrated strategies, and thus calibrated 
strategies. 
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To end this section, we note that finite cahbration can also be defined with respect 
to some weights {i^li] £ C}. A strategy a of the player is weighted-calibrated with 

respect to {p{£],i^[i] ; £ G £} if for every strategy r of Nature, for every £ G £, 

limsup^-^^((\\oJn[i] -p[i]f - u[£]) - (min||aJ„[£] -p[k]f - u[k])) < 0, P,,,-as. 

Corollary 3.1 For every grid and weights, there exists a calibrated strategy with respect 
to them. 

Given {p[i], i^ll] ; ^ G £} , the Laguerre cell (or Power cell) associated with p[i] and 
z/[£] is 

P[£] ■= |p G A{n) s.t. - < mm\\p-p{k]\f - iy[k]j; 

as in Remark 13. 1[ a weighted-calibrated strategy ensures that uJn[£\ converges, as soon 
as the frequency of i is not zero, to P[i]- Because of the squared norms, this set is also 
a polytope. 

3.1.1 Discussion on the impossibility of deterministic calibration 

When Q = {0, 1}, Oakes |57| and Dawid |17) output an example of Nature's strategy 
ensuring that no e-deterministic calibrated strategies exist. Their idea is actually quite 
simple yet highly unstable. Define the strategy as follows: given the past history /i", 

if Pn+i > ^ then = and if pn+i < ^ then ujn+i = 1; 

In words, if the forecaster claims that it will rain with high probability then Nature does 
not make it rain and if it claims that it will not rain, Nature makes it rain. 

This prevents any deterministic strategies from being e-calibrated, but this is not 
immediate (and the proof, although quite simple will shed lights on the following dis- 
cussion). We distinguish two cases, either the predictions of 1/2 have an asymptotic 
positive frequency or a null frequency, i.e., if 



either limsup 



{m < n s.t. pm = 1/2} {m < n s.t. Pm = 1/2} 
> or lim 



0. 



n n^oo n 



In the first case, p[l/2 -|- e,e] > 1/2 while ^[1/2 -|- =0 thus such a strategy is not 
e-calibrated. 

In the second case, we can assume that no prediction falls exactly at 1/2 (since 
their frequency goes to zero). If the predictions bigger than 1/2 have an asymptotic 
positive frequency, then necessarily, there must exist p* such that the set of stages where 
predictions belong to [p* — e,p* + e] C [1/2; 1] also has a positive frequency. And since 
p^[p*,e] > 1/2 and a7.„[p*,e] = 0, the strategy is not e-calibrated. 
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If the predictions bigger than 1/2 have an asymptotic null frequency, then necessarily 
the predictions smaller than 1/2 have an asymptotic positive frequency, and the same 
arguments hold (because we assumed that no predictions were equal to exactly 1/2) . So 
no deterministic strategy can be e-calibrated. 

On the other hand, consider the deterministic strategy of the player that predicts at 
odd stages p„ = 1/2 and at even stages p„ = 1/2 — 1/n. The only accumulation point 
of the sequence of predictions if 1/2, so for every p ^ 1/2 there are a finite number of 
prediction e-close to p, for every e smaller than some Ep > 0. And on the other hand, 
for p = 1/2, no matter e, if n is big enough, M„[0.5,e] contains approximatively half 
predictions below 1/2 and half above, so the empirical distributions is asymptotically 
equal to 1/2. As a consequence, no matter e > and p ^ 0.5, 



lim sup ■ 



n 



p„[0.5,e] - u;n[0.5,e 



and lim lim sup = 0. 



Obviously, this does not contradict Oakes |57| and Dawid |17) counter-example. The 
reason is that, on the stages when the prediction is e-close to p* = 1/2 + £, the average 
prediction is 1/2 while the empirical state is 0. But one might argue that predictions are 
actually never close to p* (but e-away) , so Oakes and Dawid argument fails if calibration 
was defined only with respect only to those points p that are accumulation points of the 
sequence of predictions (i.e., there are predictions arbitrarily close to them). 

This argument can be generalized to any stationary strategy of Nature (i.e., if 
'^n = f{Pn) for some fixed but possibly random mapping /). Unfortunately, we are 
unable to claim that there exists deterministic (e-) calibrated strategies with respect to 
accumulation points, but this shows how the very concept of calibration is unstable with 
respect to small variations in definition or objectives. This subject is somehow once 
again developed in Section 13.31 



3.1.2 Efficient caUbration in the binary case 



Foster |21) has designed an algorithm that computes efficiently an e-strategy in the 
binary case (although it seems that it was Abernethy, Bartlett &: Hazan [1] that noticed 
its efficiency). The idea is to consider a calibrated strategy with respect to the regular 
grid {p[e\ := e + 2£e ; I e C] where £ := {0, 1, . . . , ([e-^J - l)/2} 
Following Foster's notation, we define, for every ^ G £, 

'^"t^l' ^ a;„M - {p[i] + e)] and 4 = HliM ( ^^[^] _ _ , 



n \ / n 



so that a strategy is calibrated if, asymptotically, every and are smaller than zero. 
Foster's algorithm consists in finding at stage n an element i* £ C such that 

- either both e^* < and < 0; in that case, predict p[i*] 

- or e^~^ > and > 0; in that case play p[i*] or p[i* — 1] with a respective 
probability proportional to and e^~^. 
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Existence of such a I* is ensured by the fact that the first d\ and the last are always 
non positive. Computations show that the error converges to zero. 

So the tricky remaining part consists in finding efficiently this £*. To this purpose, 
Abernethy, Bartlett &; Hazan [T] introduced, for every ^ G £, the quantity 

6*^ = if > 0, 0^ = -d^ if > and Oi = otherwise, 

which is well defined since and cannot be simultaneously positive. Specifically, it 
always holds that 0^ > and 9^ < so if any of them is equal to zero, Foster's strategy 
dictates to predict it. Otherwise, one must find £* such that 0^ ~^ > and < and 
the main argument is that it can be done through a binary search, thus in O (log(l/e)) 
steps. 

Foster's strategy can be somehow generalized with more than two outputs (see e.g., 
Mannor &; Stoltz |50| ) although, unfortunately, at the cost of efficiency since the binary 
search trick does not extend. 



3.2 Generalization 

Recall that, roughly speaking, a strategy is calibrated if on the set of stages where 
the prediction was close to p, the average prediction and the empirical distribution 
of outcome asymptotically coincide. General concepts of calibration are induced by a 
different definition of closeness. 

Let be a family of Borel measurable subsets of A(J7) and denote, for every F £ J^, 

M„[F] = {m<n s.t. p. G f}, .,,[F] = ^^^^^^ and p^[F] = 

respectively the set of stages where the prediction was in F (before the n-th), the em- 
pirical distribution of outcomes and the average prediction on it. 



Definition 3.3 A strategy a of the player is T -calibrated if for every strategy r of Na- 
ture, 

lM„fFll 

< 0, Po-.r-fJS ■ 



lim sup sup UJn [F\ — Pn \F\ 



Several types of families have been considered by various authors. For instance, from 
the most to the least complicated 

i) Mannor & Stoltz |50| treated the most difficult problem where T is the family of 
all Borel measurable subsets of A(ri); 

ii) Rakhlin, Sridharan & Tewari |66| considered the family of every possible £i balls; 
ii) Perchet |60) defined to be some neighborhood basis of A(r2). 
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In the first case, the minmax techniques of Rakhhn, Sridharan and Tewari |65l [66] 
upper-bound the caUbration error at stage n (but with a strategy that depends on n) 
in O (n~^/^\^\^^^'^ while for the two last cases the bound shrinks to O (n""*^/^). On the 
other hand, Mannor & Stoltz |5U) and Perchet |60| obtained (actually before) the same 
results, yet in a constructive way. They are developed in Section 14.21 

Drawbacks of these definitions of calibration (which will lead to another type of 
generalization) are illustrated by the following examples. 

Assume that = {0, 1} and that the sequence of outcomes is 0, 1,0, 1, 0... (i.e., ojn = 
1 iff n is even). Consider a player that predicts, at every stage, that the probability of 1 is 
exactly 1 /2. Then this strategy is calibrated accordingly to any of the previous definitions 
of calibration. On the other hand, on the set of even stages, empirical distribution is 1 
while average prediction is 1/2 which contradict precepts of calibration. 

Even more intricate: assume that Q = {0,1,2} that a;„ = with probability 1/3 
and that 1 and 2 alternates on the remaining set of stages. The sequence of outcomes 
on any fixed subset of W contains asymptotically as many than 1 and 2 so predicting 
1/3, 1/3, 1/3 at every stage is not contradicting. On the other hand, if we consider only 
the set of stages where the outcome was 1 or 2 then the prediction is always 1/2,1/2 
while 1 and 2 alternate. 

We introduce the following concepts of checking rules. Let U and T be respectively 
an active universe mapping and a testing mapping, i.e., 

U : IJ {A{Q) X ^ A{n) X and T : |J (A(0) x Q)" ^ A(0) x ft 

such that T{h^) C U{h"'). The interpretation is that stage n-|- 1 is active if {pn+i,UJn+i) 
belongs to the active universe U^h^); given a set of active stages, calibration compares 
the empirical frequency of the tested event with the average prediction of this event. 

Such a pair (U, T) forms a checking rule and we define as before the set of active 
stages 

lin[U,r] = [m<n s.t. {pm,u:m) eU{h"'-^)'j , 
the empirical probability of tested events 

[U,T]HiPm,UJ^^) G r(/l'" 1)} 

and the average predicted conditional probability of tested events 



\T^n[U,T]\ 



Definition 3.4 A strategy a is calibrated with respect to some given checking rule (U, T) 
if, for every strategy r of Nature, 



limsup U)nM,T\ —Pn[U,T\ 



n— >-oo IT' 



< 0, Pctt-«s , 
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with the assumption that p{A\B} = +00 if p{B} = 0. 

The following theorem (a weaker version first appeared in Lehrer |41| ) continues the 
discussion of Section 13.1.11 and weakens furthermore the range of the counterexample of 
Oakes and Dawid. It shows that if checking rules do not depend on current predictions 
(but possibly on past predictions), then deterministic calibration does exist; this is quite 
obvious if one faces only one checking rule, but the result actually holds with an infinite 
number of them. 

To be formal, we embed the set of checking-rules independent of current prediction 
(i.e. pairs of mappings from |J^gj>j(A(r2) x fi)" into Q) with the cylinder topology. 

Theorem 3.2 Let X be a probability distribution on the set of checking-rules independent 
of current predictions. Then there exists a deterministic strategy a that is calibrated with 
X-almost every checking rules. 

Actually, the result that we shall prove is stronger as we will show that, as soon as 



\^nP,T]\ goes to infinity, limsup^^^^ qnP^T] -Pn[U,T] 



< 0, Pcrr-as. 



A similar result (that extends Foster & Vohra \24\ ) due to Sandroni, Smorodinsky 
& Vohra |68| deals with checking rules depending on current predictions, under the 
following extra assumptions. We assume that the calibration test compares the empirical 
distribution of outcomes with the average prediction on the set of active stages where 
predictions were in some given set F C A(0); activeness of stages might depend on 
past histories. Formally, hl{h^) is either empty (so the stage n + 1 is not active) or 
U^h"") = F X 0,. Mapping T is, on the other side, constant, i.e., T{h^) = F x {oj} for 
some CO £ Q (at least on active stages). 

Proposition 3.2 Consider a countable number of such checking rules. Then there exists 
a strategy of the player that is calibrated with every one of them. 



3.3 Smooth cahbration 

Smooth calibration is another criterion (close to usual calibration) that can be satisfied 
with a deterministic strategy, as proved by Foster and Kakade. Even more surprisingly, 
it can be used to output a calibrated strategy showing again the instability of Oakes and 
Dawid's result. 

The idea is to smooth definitions of calibrations. Indeed, notice that given F C A(r2), 
the calibration score can be written as 



\^n[F]\ 



n 



UJn[F]-p^[F] 



m=l 



and the mapping p 1— )• \{p G F} is not continuous. Instead, given some continuous 
mapping g : — )■ [0, 1], consider the following smoothen version of the score 



m Pn 



m=l 
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with respect to some checking rule (ZY, T) independent of the current predictions (so that 
U{h^) and T{h"') can be seen as subset of fi), this score becomes 



1 

n 



m=l 



A weaker version of the fohowing Proposition has been proved independently by 
Kakade & Foster |37| and Vovk, Nouretdinov, Takemura & Shafer |80| : the former named 
this property weak calibration, but we used the term weak in another meaning (i.e., when 
horizon of the game is fixed and known) . 

Proposition 3.3 There exists a deterministic strategy a of the player such that, no 
matter Nature's strategy, for every continuous mapping g : A(il) — )■ 1R+, 



lim sup — 

n—^oo ^ 



giPm)i(^n 



Pn 



< . 



If fj, is a probability distribution on the set of checking rules independent of current 
predictions, then there exists a deterministic a such that 



lim 

n— >oo n 



1 ^ 



gipM^m G ll{h^-')(l{uj^ G T{h^-')-pm.{r{h"'-')mh^"')}) < 0, 



m=l 

for /i-ae checking rule and every continuous mapping g, no matter Nature's strategy. 

As noticed by Foster and Kakade, the convergence in first part of the result can be made 
uniform with respect to Nature's strategy. 

Actually, the most surprising and interesting property of smooth calibration is not so 
much that there exist deterministic smooth calibrated algorithms, but that they can be 
used to construct an almost deterministic e-calibrated strategy as follows, see Kakade & 
Foster |37| for more details 

Let e be fixed and consider a finite e-triangulation of A ($7) whose vertices are V : = 
{vi, . . . ,vv}- Any p G A(r2) belongs to one simplex of the triangulation and we denote 
by V{p) its vertices (if there are more than one simplices, then choose one arbitrarily). 
The point p can be written as a convex combination of vertices in V{p), i.e. p = 
^veVip) l^v{p)v and it is even possible to decompose p = ^^gy ^v{p)v by assuming that 
Hvip) = for any p that does not belong to the same simplex. All those mappings fi^ 
are continuous and Lipschitz. 

We construct an e-calibrated strategy a using a fixed deterministic smooth calibrated 
strategy cr^ in the following way. Whenever dictates to predict p G A(r2), a predicts 
V GV with probability ^v{p)- Immediate calculations show that, for every f G V, 



1 " 
n 



m=l 



1 

m=l 



1 " 

l^v{Pm){'-^m - Pm) + - E lJ'viPm){P'. 



m=l 
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Since lJi-{pm)\\Pm — v\\ < e, expected calibration score (and the actual score, thanks to 
concentration inequalities) are e-close to the smooth calibration score, hence the result. 

Key features of this construction are that, although it is impossible to construct an 
£-calibrated strategy deterministically (as proved by Oakes and Dawid), it is possible by 
using randomizations on arbitrarily small balls. This is why we used the term of almost 
deterministic strategies. 



Concerning the complexity of (weak) calibration, a recent result of Hazan & Kakade 
based on an idea of Kakade &; Foster |37) . shows that it is a hard criterion to satisfy. 
Indeed, an almost deterministic strategy a (based on some triangulation of A(r2)) can 
be used to find e-Nash equilibria of games. We sketch the proof in the following. 

Consider a game between a set of players I with actions sets Ai and payoff mappings 
Pi. Define $7 = JlieX'^* ■ ^{^) ~^ ^{-^i) be a smooth e-best response of player 

i (i.e., given any p G A(r2), if p^* denotes the z-th marginal of then pi(Xi{p),p~'^) > 

We denote by p„ G A(f2) the prediction output at stage n by the strategy a and we 
assume that player i plays accordingly to Xi{pn). The profile of actions actually played 

is ojn £ ^ and one has E[a;„] = (^Xi(pn), ■ ■ ■ ,Xj{pn)^ =: X{pn). Since a is e-calibrated, 
for every vertex v and with probability one. 



lim sup 

n— >oo 



Em=ll{Pm = V} 



n 



Y^l,=lHPm = V}{V 



-e < 0. 



n 



Concentration inequalities, and the fact that X{pn) — ujn and 
martingale differences imply that, with probability one, 



lim 

n— >oo 



Em=l HPm = v}{X{v) - Wm) 



n 



and lim 

n— >oo 



Em=l HPn 



J = f } - pviPn) are 

v} - PviPm) 



. 



n 



As a consequence, summing terms, for every vertex v that is predicted with a positive 
density (i.e., such that hmsup„_j.o^ l{pm = v}/n > 0), one must have \\v — 

^(^^)|| < And so, by the very definition of X{-), v must be a 2e-Nash equilibrium. 

Therefore, not only does the empirical profile of action converge to the convex hull 
of e-Nash equilibria, but also if a stage n chosen at random then, with arbitrarily great 
probability, X{vn) is an 2e-Nash equilibrium. 



4 Equivalences between approachability, regret and calibra- 
tion 

This part is devoted mainly to describe how approachability can be used to construct 
consistent and calibrated strategies. We also show why calibration is an important and 
useful tool, as it can be used to construct general (and even approachability) consistent 
strategies. Since we can also reduce approachability to regret, this complete the circle 
and this is the reason why we called these notions equivalent. 
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4.1 Using approachability to get regret 

4.1.1 Prom approachability to regret; the finite case 

Although Blackwell |10) was the first to notice that consistent strategies can be con- 
structed using approachability theory, we first treat Hart & Mas-Colell |32) idea in finite 
dimension. 

We recall that choices of actions an £ A and bn £ B generate at stage n G W an 
external regret r.„ defined by 

Tn = r{an,bn) := (/o(l,6n) - p(a„,6„), . . . ,p{A,bn) - p(a„,6n)) G 

and that a strategy is externally consistent if || 

^TT^Iloo goes to almost surely. Actually, 
using approachability theory. Hart & Mas-Colell |32j proved the following 

Proposition 4.1 The strategy a defined by playing, at stage n + 1, proportionally to r^ 
(and arbitrarily if every component is non-positive) is externally consistent. Moreover, 
for every strategy r of Nature and n G 



< 



and, for every rj > 0, T^t^t {sup7v>n Ikn II ^ ^} ^ 3exp (— ) as soon as > 1 



'J 

Proof: We simply have to prove that a is exactly Blackwell's approachability strategy 
of the negative orthant (which is a cone) in the game where the vector payoff is 
r(a, b). This is a consequence of the following geometric property 

No matter the choice of x G A(^), x , Ex[r{a, b)] ^ = 0, for ah b e B. 

Indeed, the k-th component of E2;[r(a, b)] is, by linearity, r{k, b) — r{x, b), thus the inner 
product is equal to X^^g^ x[a] (r(o, b) — r{x, 6)) = r{x, b) — r{x, b) = 0. 

Since Xn+i = cr{h'^) is proportional to r^, the geometric property implies that 

r+, E^,^[r„+i] ^ = thus (^rn-r~ , Ef,_^[r„+i] - r ^ = 

because one always has {z'^,z~) = 0. Since r~ is the projection of r„ on the negative 
orthant, this proves that a satisfies Blackwell property, hence is an approachability strat- 
egy. Bonds follow from Corollarv 11.151 ■ 



Once the reduction from external regret minimization to approachability of ]R_ has 
been made, the existence of externally consistent strategies is immediate because IR^ is 
obviously a convex approachable set. Indeed, for every y G ^{B), there exists x G A(^) 
such that r{x,y) G R^: it suffices to take for x any best response to y. The most 
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interesting feature of Proposition 14. II is that the strategy is very simple and naturah the 
more regret a specific action induces, the more it should be played (and with a weight 
exactly proportional to this regret generated). 

Generalizations to the compact case (when Nature chooses at stage n G IN an outcome 
vector Un G [0, 1]"^) are immediate and omitted. 

Remark 4.1 One might argue that with exponential weight algorithm, the dependency 
in A in rates of convergence shrinks to y^log{A) instead of \fA, so the strategy we output 
might not be optimal. Actually this argument is flawed, rates of convergence are indeed 
optimal since we minimized the i2-norm of the regret. It is only possible to upperbound 
with Y^log(A) /n the loo-norm of the regret. 

Actually, Hart & Mas-Colell strategy is an approachability strategy of driven by the 
potential ^{z) = \\z'^\\^ (that represents the ^2-norm of the regret) while exponential 
weights are driven by the soft-max potential ^{z) = ;^ log (X^ae^^^^") ^hi^h is a twice 
differentiable surrogate of ||z'*"||oo- However, minimization of the infinite norm of regret 
can also be reduced to approachability, see Proposition 14.21 below (following actually an 
idea of Blackwell [H 



Proposition 4.2 Assume that Nature chooses outcome vectors U G [0, 1]"^ and define 
the game with vector payoffs and target set defined as follows 

g{a,U) = G [0,l]x[0,l]^ and C = |(z,y) G [0,l]x[0,l]^ s.t. z > m|xl/"}. 

Then any approachability strategy of C (which is a convex approachable set) minimizes 
the ioo norm of the external regret since 

dc{9n) < \K\L ^ ^dc{gn)- 

Proof: Convexity of C (which is actually a polytope, i.e., the intersection of a finite 
number of half-spaces and a compact set ) is a direct consequence of its definition since 

C=[^{{z,V) s.t ^ > ^"^l n [0, 1] X [0, 1]^ • 

Approachability of C is immediate: for every U GU, choosing a to be one of the highest 
component of U ensures that g{a, U) = ([/", U) belongs to C. It remains to prove the 
inequalities. 

Notice that if we denote by (z^, Un) the average vector payoff at stage n, then Un 
is the average outcome vector and Zn is the average actual payoff. As a consequence, 
the 

loo norm of the regret, Hrj^Hoo = maxag^ Un — Zn, is exactly equals to the distance 
between (z^, Un) and (maX(jg_4 C/„, Un)- By definition, the latter belongs to C, therefore 
one has dc{gn) < ||^n L _ _ 

Let a* G argmaXag_4 C/„ and (z^, C/„) = Uc{zn, Un), then 



\r+\\ — TJ — z —U -z^+z^— z < 



Un - t/„ 



+ 



< V2dc{gn) 
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where we used the fact that U i— ?• maxag_4 U"" is 1-Lipschitz. 



Extensions to the case where A C K, is a compact convex set are immediate as the 
finiteness of A is not used in the proof. 

Actuahy, Blackweh proved this result in the finite case, where Nature chooses action 
in B; in that case, stage payoffs are g'{a,b) = {p{a,b),Sb) G R x A{B) where, as usual, 
A{B) is seen as a subset of R^. The target set is 

C' = \{z,y) G E X A{B) s.t. z > maxp(a,y)| 

and since ||(7'(a, < ^/2, approachability results imply that 'g'^ converges to C at the 
rate of \/2/n thus expected regret is bounded in the order of \JB jn (because in this 
framework, y i— t- p{a^y) is -v/S-Lipschitz and not 1-Lipschitz). 

This shows that regret can be bounded, not only with respect to the number of 
player's actions (i.e. in ■\J\og{A) /n), but also with respect to Nature's one (in y^B/n). 
This might lead to some improvement if the former is exponentially larger than the 
latter. 

Remark 4.2 In the compact case, usual proofs show that BlackweU's approachability 
strategy ensures that dci^n) — V^Mloo/^ = ^/(^ + 1) /n. However, there exist a consis- 
tent strategy such that ||r+||oo < ?>y\og{A)/n. So this is an example where the optimal 
dimension dependency of rates of approachability is not \/\\g\\oo! but much smaller. 

There are two possible explanations: either minimizing step by step the £2 distance 
(i.e. following BlackweU's strategy) is not optimal, or some important facts are hidden 
within proofs. In Remarks \4.1\ we claimed that the answer was the first possibility: indeed, 
the final objective was to minimize the £oo-distance, so minimizing the I2 norm must 
induce an additional dimension- dependent constant. This is not the case here, because 
the final objective is within constant of the £2- distance. 

An open and fairly question is wether the dimension dependent term should depend 
on the specific target set C or not. In these examples, respective sizes of the target sets 
C within the set of feasible payoff vectors are rather intriguing. For instance, in the 
framework of Proposition [777| the volume of co{g{a,b)} is 2^ times the volume of C 
while it is only A-\-\ times the volume of C in the framework of Proposition This 
has to be compared with the respective size of dimension dependent constants which were 
"v/Z and Y^log(yl). 

We now turn to the minimization of internal regret i?„ = R{an, bn)- We recall that it 
is a. Ax ^-matrix whose (a, a') component is p{a' , b) — p{a, b) if a = On and otherwise. 
The generalization of Hart & Mas-Colell strategy will appeal to the concept of invariant 
measures of matrices. 

A probability distribution A G A({1, . . . , d}) is an invariant measure of a some dx d- 
matrix M with non-negative coefficient, if 

d d 

^^fc^fc.i ^ y ViG {!,..., 4, 

k=l k=l 
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and their existence is a consequence of Perron-Frobenius theorem (this also generahzes 
usual invariant measure of Markov chains (see e.g. Seneta |69|). 

Sorin |71| . but also Hart &; Mas-Colell |31) and Foster & Vohra |25| used the existence 
of invariant measure to output a simple internally consistent strategy. 

Proposition 4.3 The strategy a that dictates to play at stage n + 1 an invariant mea- 
sure of (and arbitrarily if every component is non-positive) is internally consistent. 
Moreover, for every strategy r of Nature and n G M, 



and, for every r? > 0, I^a,T |sup^> 



< - 
~ \ n 



> 7?| < 3exp (^—^^x) '^^ soon as > 1. 



Proof: As for external regret, we just need to prove that a is exactly Blackwell's ap- 
proachability strategy of the negative orthant. And again, this is a consequence of a 
geometric property: 

Any invariant measure A of any matrix M with non-negative coefficient 
satisfies, no matter the choice of 6 G i3, ( M , EA[i2(a, b)] ) = 0. 



Let U"" := p{a,b), then the (i, A;)-component of E;s^ [i?(a, 6)] is yll'' — Wj . So the inner 
product is equal to M^-'^A* (U^ — U''] and the coefficient before J7* in this sum is 



since A is an invariant measure of M. 

Since Xn+i = cr(/i") is an invariant measure of i?^, geometric properties implies that 



^<,,E,,,[i?„,+ l]^ =0 thus (^Rn-R^,'E„,r[Rn+l]-Rn)=^- 

This proves that a satisfies Blackwell property, hence is an approachability strategy and 
bonds follows from Corollarv 11.151 ■ 



Once again, using approachability theory to prove existence of internally consistent 
strategies is immediate: the negative orthant satisfies Blackwell's property. An inter- 
esting feature of this algorithm is the simple characterization of this optimal (for the 
minimization of the £2 norm) strategy. 

Interestingly, the reduction from external to internal consistent strategies (see Section 
l2.1.3l or Stoltz &: Lugosi [75]) run with the algorithm of Proposition 14. 1 1 constructs exactly 
the strategy of Proposition 14.31 
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So both Propositions 14. l l and 14.31 can be unified into the the foUowing theorem that 
deals more generally with <&-regret. It exhibits a strategy with the same complexity as 
the previous internally consistent strategy, dictating to play at each stage an invariant 
measure of some matrix. Given a family we recall that <&-regret at stage n is denoted 
by i?* e and defined by 



06* 



Finally, given M e let e*(M) be the A x ^-matrix whose (a, a ) component is 



(a)=a' 



Theorem 4.1 Let ^ be a family of swap mappings. The strategy playing at stage n + 1 
accordingly to any invariant measure ofQ^{R^ ) has no ^-regret. Moreover, for every 
strategy r of Nature and n € IN, 



Rf 



Rt 



< 




with = max 
n aeA 



G $ s.t. 



and, for every r] > 0,F„^r \^supp^^n R^^ > r?| < 3 exp (^-^^ 



as soon as 



> 1. 



Proof: The proof follows closely the ones of Propositions 14. l l and 14.31 Indeed, one just 
has to prove that this strategy is an approachability strategy of M.'*', using the following 
geometric property: 

Any invariant measure A of any matrix 0(Af ) with non-negative 
coefficient satisfies, no matter the choice of b ^ B, ( M , W]\[R^{a, b)] ) = 0. 



Indeed, if one denote U = p{-,b), then 

[m, Ea [R"^ (a, 6)] ) = ^ A/* ^" {^^''^^ - 

^ A" ^ (U"-' - 



aeA a'6^0:0(a)=a' 



aeA a'£A 



Yl i Yl A"'G*(M)"''" - A" ^ e^iMf'"' j [/" = 

a&A \a'eA a'&A / 



since A is an invariant measure of 0*(Af). As a consequence, this strategy is exactly 
Blackwell's approachability strategy of the negative orthant. The result comes from 
the fact that R^{a,b) has at most A^ non-zero components, each one in [—1,1], thus 
\\R'^{a,b)f <A^. m 
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Remark 4.3 ^45 usual, if techniques from approachability in infinite dimension are used 
instead of regular approachability, the term in \fA for external and internal regret or 
y/A^ for ^-regret can be replaced by respectively y^log(yl) or \J A log(yl), up to some 
constant. 

4.1.2 Prom approachability to regret; the infinite case 

We turn in this section to the case where action set A is no longer finite but some convex 
compact metric set and at stage n G M, Nature chooses a mapping C/„ : .A — t- [0, 1] in 
a set lA of equicontinuous mapping. We show how previous results can be extended to 
this compact case (indeed, Arzela-Ascoli theorem ensures that Vl is relatively compact). 

Theorem 4.2 In this compact case, there exists a strategy without $c regret, where $c 
is the set of continuous mapping from A to itself. 

Proof: Consider an auxiliary game where action sets of player and Nature are A and U. 
Choices of a G ^ and U £ U generates a payoff U[a] G C2{^c, A), where A is some fixed 
probability distribution over (<I>c, || • ||oo) embedded with the Borelian cj-field, defined by 

U[a]icj)) ■.= Uicl){a))-U{a), V0 G $e. 

The convex set C = £^($C)A) := |C/ G £2(^0 A) s.t. [/ < 0} is not excludable by 

Nature; indeed, for any U € U, there exists a € A (any global maximizer of U) such 
that U[a] belongs to C. Thus it is approachable by the player, and any approachability 
strategy has no (/)-regret, for A- almost all mapping iji) G $c- 

However, is separable (see Rudin |67) or Stoltz & Lugosi |76|), so there exists 
{(pk', k G M} a countable dense subset of $c; the corresponding probability A we consider 
is A = X^fcgiN 2~'^5<^j. • Since U is a family of equicontinuous mappings, every mapping 
U GU share the same modulus of continuity w(-); this means that, for every e > there 
exists 6 := co^e) such that if d{a, a') < 5 then \ U{a) — U{a')\ < e, for any mapping U GlA. 
Given (/> G $0 there exists (p^, such that — i;^>fc|| < 5 thus 

^ n 1 " 

- Um[(t>{am)] - Um[am] < - Um[4>k{am)] - C4i[Om] + £ • 

m=l m=l 

Since a has no (/)/c-regret, its (/i)-regret is asymptotically smaller than e, for every e > 0, 
thus it has no ^c-regret. ■ 



Corollary 4.4 Conclusions of Theorem \4-S\ hold ifU is the convex hull of a finite set 
of upper- semicontinuous mappings bounded from below and is the set of constant 
mappings. 
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Proof: Every U £U is upper-semicontinuous over a compact set, it admits a maximum. 
Therefore U is uniformly bounded and the set C is approachable, with respect to some 
probability distribution A that remains to be defined. 

Denote by Ui, . . . , Um the extreme points oi U. As they are upper-semicontinuous 
and bounded, their exists a countable subset {0^; k € A/"} C A such that, for every e > 
and every a £ A, there exists satisfying Ui{ak) > Ui{a) — e, for every i € {1, . . . , m}. 
Define A as any probability measure whose support is exactly this countable subset. 

The rest of the proof follows the one of Theorem 14.21 ■ 



In the finite case, approachability theory not only provides a quick and easy proof of 
consistent strategies, but also exhibit explicitly some of them. In fact, playing somehow 
proportionally to the positive part of the regret is still externally consistent in the com- 
pact case. Let A be any positive probability measure on {a^; k G M}, a countable dense 
subset of A and denote by r^[afc] the external regret at stage n induces by action a^. 

Consider the strategy that chooses at stage n + 1 with probability _^fc''"_[°fc] ^ 

l^e r„ [ail 

Then, as in the finite case, one can easily show that the geometric property holds, i.e., 
(E[r„,+i],r+) = j;^^ . Yl (Un+^i^e) - C/„+i(afc))r+MA, = . 

Approachability in infinite dimension (along with the density argument) ensures that 
this strategy has no external regret. 

Concerning ^-regret, one cannot simply play accordingly to any invariant measure 
of some infinite dimensional matrix, as their existence is not ensured. However, it is still 
possible to discretize finitely A to get a <I>-regret smaller than e, with e-arbitrary small 
(or even equal to 0, if e is taken as a decreasing sequence, see Proposition II. 7p . 

Let UJu{-) be the common modulus of continuity ofU £U and A a finite Ci^(e)-grid of 
A. For any (/> G $, we define (p : A ^ A hy (p = argmin^/g^ d{(f){a),a') with ties broken 
arbitrarily. As a consequence, for every a G A, U GlA and non negative q G £2(^5 A), 



/ 



q{<P)U{4>{a))d\- / q{<P)U{<P{a))d\ 



< £ . 



We define, for any (a, a'), 6[g]^'^' := /^a,a' qdX, where := |(/> G $ s.t. 4){a) = a'j. 

Let X be any invariant measure of the matrix Q[q\ then one has 



<?,f/(x))<5]x-j;e[g]'^''^'(t/(a')-f/(a)+ / q{^){u{^{a))-U{<l>{a)) 



aGA aeA 



dX 



< e. 



This proves that C2 {^c, X) is a i?-set, hence approachable. We can only claim that the 
strategy we exhibited has some flavors of invariant measures. 
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4.2 Using regret to get calibration 



We show in this section that finite caUbration can easily be understood in terms of 
internal regret. The first idea goes to Foster &; Vohra |23| and it has been somehow 
clarified by Sorin |71| . Recall that, in finite calibration, Nature chooses at stage n an 
outcome cOn G ft. The player formulates a prediction on a;„ by choosing a probability 
distribution p[^n] £ A(J7) that must belong to a finite grid {p[i] ;i € C}. 

Theorem 4.3 There exists a strategy a calibrated with respect to the grid {p[i] ;i G C}, 
such that, no matter the strategy r of Nature, 



lEcr,r 

where S{C) 



sup ■ 



n 



\uJn[i] - P[i]\ 



min — I 



< 6 



log(L) 



n 



so 



sup ■ 



I^NnMIAl- 



n 



\^n[i] -p[^]\\ - ™-n||a;„[^] - p[k]\ 



< 



6 /log(L) 



n 



inf£^fcg£ the diameter of the grid. 



Proof: The proof uses the fact (simply obtained by expanding sums) that, for any 
sequence qm and every £, A; € C, 



E 



p[l]f 



WOJr, 



-p[k]f 



\^n[P\ -P[ 



\^n[P\ -P[ 



Now consider the game with action space C and fl where choices of t and w generate 
the payoff p{i,U}) = — \\uj—p[i]\\'^. An internally consistent strategy satisfies, by definition. 



lim sup sup 

n— 5>oo £h 



n 



E 



■p[i]\ 



p[k]\ 



m6M„ 



\^n[i]\ 



\^n[i]\ 



< 0. 



So this, along with the basic fact, shows that any internally consistent strategy is cali- 
brated with respect to the grid G £}. Rates of convergences follows from those 
of internal consistency. ■ 



We stress out that we proved a stronger result than require; the calibration score 
converges almost surely to zero, at a rate independent of Nature's strategy. 

Remark 4.4 This proof of calibration highlights the following fact. It does not really 
matter that uom belongs to a finite set fl and that pn are probability distributions over fl. 
Indeed, one can just assume that sequences ojn and pn belong to some compact set of an 
Euclidian space W^. Similarly, given two finite families of predictions {p[i] G IR'^;^ G C} 
and weights {i^li] C}, we recall that weighted calibration sis defined as 

(\\zj,,[£]-p[i]f -u[£]) - ( mm\\zj^^{£] - p[k]f -u[k] 
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and the exact same proof (yet with p{i,uj) 



\\uj-p[ 



+ I'li]) gives the existence of 



weighted calibrated strategies. Rates of convergence are identical except that the constant 
6 is replaced with 6 + 3max^g£ I^MI- 

Notice that we defined finite calibration with respect to the uniform norm of the 
positive part of ( ||tJ,i[^] — — min^g/; ll'^nM — p[k]\\ ) • And this quantity is upper- 

V / £,k 

bounded optimally by the exponential weight algorithm. We could as well have defined 
calibration in terms of the i"^ norm of this vector and as in regret minimization, playing 
an invariant measure could then improve bounds. 

Next proposition states that, quite surprisingly, there exist e-calibrated strategies 
with rates of convergence independent of e (and even of for a slightly weaker notion). 

Proposition 4.5 For every e > 0, there exists a grid {p[i];i € £} and a strategy a such 
that, no matter the strategy t of Nature and for every n S M, 



sup 



n 



< w- . 

n 



Moreover, this strategy is e-calibrated, with a rate of convergence independent of e, since 
one also has, for every n G 



sup 



\^n\p,e]\ 



n 



< 



n 



, with 7(0) < (20^ 



Proof: Let e be fixed; the strategy considered is simply a calibrated strategy with 
respect to some well chosen grid of A(0). Recall that A(0) is written as the following 
subset of R"' with d = Q - 1: 

d 

A(0) := l^q={qi,...,qd) G R"^ s.t. qi,...,qd > and ^ < l}. 

k=l 

Denote by the unit vector of 'M^ whose components are all zero except the /c-th which 
is one. The regular grid considered is indexed by Ce and defined by 



J^^e.e,, G A(J]);nfcG]N 



.fc=i 



\fd 



k=l 



Vd 



Given a point p[£] of the grid, its neighbors are points p[£'] such that nk[i] = ni:[£'] for 
every k G {1,. . . ,d} except for exactly one ko which is such that — = 1. 

So if we denote by M[i] C Cs the neighbors of p[i], it contains at most 2d elements. 
The basic idea behind the specific geometry of this grid is that 

any point q G A(J7) is closer to p[i] than to any other p[i'] if and only if 
it is closer to p[i] than to any of its neighbors. 
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Consider the game introduced in the proof of Theorem l4.3| except that choices of G £e 
and Un generate an internal regret B!^ whose (£,/)-th component is 

otherwise 

As a consequence, using the simple fact concerning averages of norms, 



/ 




2 




I 







\R!^\\ < 4— and i?! 
yd 



n 



Same arguments as in the proof of Proposition 14. 31 yield that playing, at stage n + 1, any 
invariant measure of (i?„)^ ensures that Eo-,t {Rn)~^ — 16e^/((in). 

It remains to relate ||tJ,i[£] — to First, we write 0Jn[i] = p[f] + Ylk=i ^k^k 

where we can assume (up to a change of signe) that every is positive and even 
Xk ^ e/Vd (otherwise iOn[^] is even closer to p[£])- 

We denote by p[£k] = p[£] + 2eefc/\/(i the neighbor of p[£] in the direction of e^, so 
that efc = {p[ik] — p[£])/\\p[£k] ~ P[£]\\- Triangle inequality implies that 



Vd 



e + 



e + 



e + 



A£\ + ^ ^n[i\ 



k=l 



p[4] - p[ 



k=l 



\k=i\ ^ 



\Wk\-pm\ 



e + 



e + 



[pn\£\-pm -Pn\£\-P\£k\ 



\ k=l 

Vd 



mh]-p[m 



4e 



\ k=l 



I' - ||zj„M-p[4]||' 



To sum up, we have prove that, for every i G Ce, 



(^\\ZJ^[£]-p[i]f-\\uJ^[£]-p[i,]fy 



1 2 
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Multiplying both sides of this inequality by taking the square and summing over 

£ € £e, one obtains 



Rn 



E 



n 



< 



d 



therefore the strategy a ensures that 



sup 















Rn 


2. 



< 



This gives the first part of the proof. The last part is due to the fact that there are less 
than [2^/d)'^ points in the e-ball centered at some p G A(0). ■ 



Proposition 14.51 also allows to recover the following result of Mannor &; Stoltz [50| 

Theorem 4.4 When T is the family of all Borel subsets of A(0), there exists a T- 
calibrated strategy a such that, for every strategy r of Nature, 



\Nn[F]\ 



n 



< 7n n+i , Par-a-s-, 



and, for every 6 > 0, with probability at least 1—5, one also has 



n 



p^[F]-u:n[F\ 



< ^ + 21 



'log (I) 



n 



, T^^r-a-s.. 



Proof: The result is a consequence of a doubling trick applied to strategies constructed 
in Proposition 14.51 Assume that the strategy adapted to some e is played during N 
stages. On those stages, one has 



\^n[F]\ 



n 



p„|F]-J5„[F] 



llNnMI 



n 



Rn 



Taking expectation and using the fact that L;, < e yield that 

\^n[F]\ 



n 



p^[F]-u:n[F\ 



1 

Hence, the doubling trick adapted to the sequences Sk = (^) '^^^ , played during 2^ stages 
ensures that, denoting n = 2^° + m < 2^'^'^^ , 



Eo-,T 



\^n[F]\ 



n 



p^[F]-uJn[F] 



/ko-1 



fc=0 
ko 



< 



2^0 /-^ 



ud+l 
''d+2 < 



fc=0 



d+1 fco+ 
2d+2 - 1 2 d+2 



1 1 

nd+2 
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High probability bounds are classics consequences of concentration inequalities, since 



|]N„[F]| u:n[F]-pAF] /n 



where the sequence Yn 



such that < 2 and, by Jensen inequality, ||l"n|| < ||^n 



= (w„ - Vn) IjPn G F} is 

E[Fj||+i^[||F„||]. ■ 



In fact. Theorem 14.41 slightly improves the result of Mannor & Stoltz |50| since it 
implies that 



lim supnf^+i ■ 

n— >oo 



n 



< 7, Po-T-as. 



Rakhlin, Sridharan & Tewari |66) wrote the calibration problem in terms of a gen- 
eralized regret, see Section [2.2.21 Formally, assume that actions spaces are respectively 
A(r2) and il. and that the stage game payoff is null, i.e. g{p,u}) = 0. The class of depar- 
ture function considered are |Cp,Aj P S A(r2),A > o| where (^p x ■ A{0,) x 17 — t- and 
the evaluation mappings i?„ : (R^)" — ?■ R are defined by, for every n G M, 



^p,Ab](Pn,Wn) = H\\Pn - P\\l < A}(p„ - and BniZi,.. .,Zn) 



1 " 

n 



m=l 



As a consequence, one easily has that regret is upper bounded by calibration score, as 



sup BnUp,x[a]{Pl,^l), ■■■ ,^p,x[a]{Pn,OJn)) - Bn[gipi,UJl), g{Pn,^^n) 
p,X ^ 



: sup ■ 

p,\ 



n 



UJnlp, A] -Pn\p,X] 



The max-min formulation of the regret minimization problem (see Section I2.2.2|) proves 



that sup. 



'p,X 



WAr[p, A] - Pn[P,>] 



^ l\f[p, X]/N is upper bounded at the final stage by 



cQ'^ ■\/log{N)/N where c is a universal constant. An alternative (and actually more 
general) proof is given in the next section. 



4.3 Using Approachability to get (smooth and generalized) Calibra- 
tion 

In this section, we show that recent results in calibration can be rewritten solely as the 
existence or construction of some approachability strategy. The first result we exhibit is 
a generalization of both a previous one of Perchet |60) (since the strategy is calibrated 
with respect to much larger families) and Rakhlin, Sridharan & Tewari |66) (because the 
proof is constructive and not horizon dependent). 

Theorem 4.5 Let T := A]oo;p G A(r2),A > o| he the family of i^o -halls. Then 

there exists a calibrated strategy a such that, no matter the strategy r of Nature and for 
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every n € M, 



sup 

p6A(n),A>0 



|Wn,[p,A]| 



n 



wnb, A] -p„b, A] 



Proof: Let e > be fixed. As in the proof of Proposition |431 the set A{Q) is represented 
as a subset of M!^ (with d = ft — 1), 



A{n) = |p = {pi, . . . ,pd) G R'^ s.t. pi,...,Pd>0 and ^pfc = l} 
and we consider the regular e-grid Cs defined by 



k=i 



|^2nfce.efc G A{n);nk G wj =: = ^ 2nfc[£]e.efc ; £ G £,| . 

Although the family of £oo-balls is infinite, the number of different possible intersections 
of such a ball with the grid is obviously finite (it is trivially bounded by its number 
of subsets, 2^~). However, an ^oo-ball Bao{p,X) is rectangular and can be described by 
two extreme points: the lowest corner p — ^^^^^ Aoj^ and the highest corner (in every 
direction) p + Yl'k=i 

The grid Ce is regular, so this characterization holds for intersections with ^oo balls: 
they are characterized by two extreme points. As a consequence, they are at most 
Cs < e~^'^ different possible intersections. Consider a fixed family of ^oo-balls that 

induce exactly these different intersections, and denote it |i?oo(p[^]) Afc); G /c|. 
We introduce an auxiliary game with action space and 0, payoff mapping 



gi£,u)=(l{ 



kefc 



and consider the closed and convex target set C := 5oo(0,e) C {R'^) . 

Given q G A(r2), the pure action i corresponding to a point of the grid p[i] such 
that \\p[i] — q\\oo < £ ensures that g{£,q) belongs to C which is therefore approachable. 
Moreover, since C is rectangular, the approachability strategy of Corollarv ll.161 adapted 
to the potential ^(z) = i log (j2keK Eti e^^^""^) + e-^^^""^)) , ensures that 



\9n\ 



log(2dK) ^ log(2d) + 2(ilog(l/e) 



7]n 



rjn 



Therefore, given G W such that A^ > 2ed, the choice of e/4 = r] 
ensures in particular that 



8^l0g(i) 



\9n\ 



d 



<6W^log — 



A^ 



A^ 



2d 
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As usual, when playing by blocks of increasing size 2"* (starting at m such that 2— > ed), 
the last two displays ensure that, for every n G M, 



d 



< 12W2-log — . 



n 



2n 



d 



The result comes from the fact that, by construction, for every n G M, 

\^n[p,X]\ 



sup 

peA(n),A>o 



n 



t^„[p. A] -Pn[P, A] 



\\9n\ 



If d > 3, since Ce < e'^'^/dl, constants in Theorem 14.51 can be lowered if one is 
only interested in the asymptotic behavior. This result holds almost surely since, using 
concentration inequalities, with Pa,T probability at least 1 — 6, 



/ d^ /2n\ 2d, f2n\ 1 , /I 

<12J2-log — +2.— log — +-log - 



\ d J y n \ d J n \d ^ 
Statement concern ballsy however, it is also possible to show that for other i?p-balls, 

the number of possible intersection with the grid is bounded by O (^)'^^^ ^ (see e.g. 
Rakhlin, Sridharan &: Tewari |66|). Thus the results holds, up to some polynomial term 
in 0, for any other ^p-norm. 

This technique could actually have been used to proved Theorem 14.41 a similar result 
with respect to the family of Borel sets. The difference is that the number of possible 
intersection between Borel sets and our grid would have been in the order of 2^/^ . After 
taking the logarithm, equalizing the three remaining terms in regret e, rj and l/(e"'ryn) 
yields that e = r] = n"^/^^"*"^-* . This would have been the bound on expected regret. 

We now turn to calibration with checking rules and smooth calibration, and we show 
that they can be reduced to approachability problems. We recall that given a pair of 
mappings U and T, we defined 

IN„[^,71 = {m<ns.t. (p^, w^) G ZY(/i'"-i)} , 

the empirical probability of tested events 

^-^''^^ = imTni ' 

and the average predicted conditional probability of tested events 



If a checking rule is independent of current predictions, then the same definition hold 
with {pm,^m) G U{h'^) (resp. in T{h"')) replaced by G U{h^) (resp. in T{h"')). 
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Theorem 4.6 Let X be a probability distribution on the set of checking-rules independent 
of current predictions. Then there exists a deterministic strategy a that is calibrated with 
X- almost every checking rules such that, Vfy-T-almost surely, 



lim sup 



ln[U,T]-Pr,[U,r\ 



<o, 



as soon as |]N„[ZY,T]| increases to infinity. 



Proof: Proof relies essentially on approachability with activation in infinite dimension. 
We define an auxiliary game where payoff is a random variable over the set of checking 
rules independent of current predictions. Action set of the player is reduced to A(il)o, 
the interior of A(r2) - so that conditional probabilities are well defined - and payoff at 
stage n is l{un G 7'(/i''"^)} - p{T(/i'^"^)|Z^(/i''"^)} if the coordinates iU,T) is active, 
i.e., if n e ]N„[ZY,r]. 

By definition, average payoff at stage n is exactly uJn[^, T] — Pni^i "^1 shall 
construct a strategy a that approaches the convex set {0}, that is, using Theorem 11.61 
find Pn+i G A(r2)o such that, for every a; S 0, 

l{w GZY(/i")}(l{w G r(/i")} -p„+i{r(/i")|^/(/i")} 



n - -pju,np |iN„,.[z.,ni ^"^ 

is less or equal to zero (or at least smaller than e„ = 1/n^). 

To construct this Pn+i, we consider the game with payoff defined on A(il)o and Q by 

^^^'^^ = / ^"fi|]i|^[^"r]i^ nhn}-pmhnmhn})dx 

and g is extended linearly in its second variable on A(r2). Since one always has 

the integrals in the last two displayed equations coincide, so we just need to prove that 
there exists Pn+i G A(r2)o such that g{pn-\-i,uj) < e„, for every w G J7 or, more generally, 
that 

inf sup g{p, oj) < 0. 
pGA(n)o uien 

And this is a consequence of Lemma l5.1| since g{p,p) = for every p G A(Q)o, g{p, •) is 
affine and g{-,p) is continuous on A(il)o. H 



When checking rules might depend on current predictions (see Sandroni, Smorodinsky 
&: Vohra |68| or Foster, Rakhlin, Sridharan &; Tewari |22)). the result and proof are almost 
identical. 
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Proposition 4.6 Let X be a probability distribution on the set of checking-rules. Then 
there exists a strategy a that is calibrated with X-almost every checking rules such that, 
Pfj^r-O'l'Tiost surely, 

limsup uJn[U,'T] —Pn[U,T] < 0, 
as soon as |]N„[ZY,T]| increases to infinity. 

Proof: the proof is almost identical to the case of checking rule independent of predictions. 
The only difference lies in the definition of the payoff g{p,Lj) which is 

uJn[u,T]-PnP,T]^^^^^^^ G um}(H{u^,p) G Tm}-p{Tmmhn})dx. 



i + \mn[K,T]\ 

Since g{-,U!) might not be continuous, Lemma [5. II does not apply. However, g is bounded 
and defined over A(r2)o and the former being measurable and the latter finite. There- 
fore, see Sorin |70) Theorem A. 9, this game has a value in mixed action. And this value 
has to be smaller than since g{p,p) = for every p E A(r2)o. I 



The last similar reduction to approachability concerns smooth calibration. 



Theorem 4.7 There exists a deterministic strategy a of the player such that, no matter 
Nature's strategy, for every continuous mapping g : A{0,) — )• ]R_|_, 



lim sup — 

n—^oo n 



m Pm) 



m=l 



< . 



The same result holds if one adds checking rules independent of current predictions. 

Proof: The set of continuous mappings from A(il) to ]R_|_ is separable and we denote 
by A a probability distribution with support {gk',k E W}, a dense countable family. 
Following the lines of the proof of Therorem 14. 6| we define 

un[9.] = and -pM - 



n 



n 



Then, Corollary 11.41 ensures the existence of an approachability strategy such that, for 
every /c E IN, ||tJn[5fc] — Pnbfc]|| converges to zero. Indeed, one just has to prove that 
{0} C £2 is approachable, thus that for every n E M, there exists Pn+i G A(r2) such 
that, no matter w E il. 



Wnbfc] - PrXdk] C/fcK+l](w - Pn+l)]dX < 0, 



where we assumed that 0/0 = 0. The existence of such p E A(0) is again a consequence 
of Ky Fan's inequality generalized in Lemma 15.41 
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Since {gk',k G IN} is a dense family, necessarily — must converges to 

zero, for every continuous mapping g. ■ 



A close look to the first proof of existence of deterministic smooth calibrated strate- 
gies, due to Kakade & Foster |37| . shows that they also have constructed an e-approachability 
strategy (and then used a doubling trick). We proposed here a direct (and maybe more 
intuitive) proof. 

4.4 Using calibration to get regret and approachability 

Calibration in some auxiliary game can be seen as a useful tool to construct strategies 
that satisfies another criterion as approachability, no internal regret and so on. This 
idea goes back to Foster & Vohra |23| and was used, recently, by Perchet |59| [6T| 162) : 
in particular, it is useful in a specific case of general regret (see Section I2.2.2P defined 
below. 

But first, we focus on usual internal regret in the finite case (although it can be 
generalized immediately when B is any compact set). Recall that a strategy is internally 
consistent if the supremum limit of 

^^"'^"^^ ( max p (a*, 6,, [a]) -p(a,6„[a]) 

is non positive. By linearity of p{a, •), this quantity can be immediately rewritten into 

\^n[a]\ 



n 



((!!/'("'■) - ^n[a]f - ||p(a, Of ) - min (^||/)(a*,-) - 



up to a factor 2. As a consequence, any weighted-calibrated strategy with respect to 
{p{a, •), \\p{a, •)|p; a G A} is internally consistent. Since scores are actually exactly the 
same, rates of convergence of weighted calibration give rates for regret minimization. 

We now turn to generalized regret. Assume that A and B are two compact and 
convex sets and let G : ^ x ;B — )• IR be any fixed evaluation mapping that might not be 
linear in any of its coordinates. In this framework, a strategy has no G-external regret if 

limsup sup G{a*,bn) — G(an,bn) < 0. 

n— 5>oo a*6v4 

To define internal regret, assume that a strategy only uses a finite number of actions in 
Ac = G so that a is actually a mapping from the set of finite histories into 

C, and in = ^ means that action a[£] is played at stage n. Define 

={m<n s.t. = i}, and K[i] = Yl 
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A strategy has no (£, e)-internal regret if, no matter the strategy r of Nature, Po-,r-almost 
surely, 

limsupH!^ j sup G{a*X[e]) - G{a[i],bn[^]) -e) <0, G C. 

Proposition 4.7 If G is continuous, then for every e > there exists a {C,e) -internally 
consistent strategy. However, their might not exist any (e-Jexternally consistent strate- 
gies. 

Proof: Since G is continuous, for every e > 0, there exists some 6 such that ||(a,6) — 
(a',b')\\ < 5 implies that \G{a,b) — G{a',b')\ < e/2. Consider a' any calibrated strategy 

with respect to G £|, a 6/2 grid of B. Assume that when a' predicts b[i], then 

a dictates to play a[£] G argmax^g^^ G(a, 6[£]). 

Since a' is calibrated, for every ?? > 0, one has that, P^j' -^-as after some stage A^, 



sup ■ 



2 



In particular, as soon as rj < 5^/4, either rj is smaller than (5^/4 and then 



^ < or is smaller than Ai^/S'^. 

le first case implies that G(a, b[(\) — G{a, bn[i]) < s/2 for every a & A, thus in both 



bn[(] 

T 

cases one has that, after stage A^, 

supS!^/" sup G{a*,b[£]) - G{a[i]Xm-e) < ^^77, 

which characterizes a (£, e)-calibrated strategy. 

It remains to prove that there might not exist externally consistent strategies. Define 
G{a, 6) = (1 — 46)a, for every a G [0, 1] and b G [0, 1] and assume that during the first A^ 
stages (with A^ is large enough) 5„ = 0. Necessarily ajsf is arbitrarily close to 1. During 
the next A^ stage, define bn = 1 then a2N is at most 1/2 thus the external regret is of at 
least 1/2. ■ 



We now prove how to construct an e-approachability strategy via calibration. Given 
a closed and compact set C C M!^ and a vector payoff mapping g : A(^) x A{B) — )■ R'^, 
define G{x,y) = —dc{g{x,y)) for every x G A(^) and y G A(B). If C is approachable, 
then Blackwell's condition ensures that sup2..g^(_4) G{x*,y) = for every y G A(i3). By 
convexity of dc and the triangle inequality, 

dc{gn)<E^-^dcm£]) 

lec 

gix[e]XW)-g^[£] . 
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Both sums converges almost surely to zero, respectively because a has no internal re- 
gret (with respect to G) and because of concentration inequalities since g{x[i],bn) = 
E[g(an,6„)]. One can resort to the doubling trick (since we can easily derive uniform 
speed of convergence) to get an approachability strategy. 

4.5 Using regret to get approachability 

We proved in the last section how calibration and generalized regret can be used to 
construct approachability strategy, as noticed by Perchet |59| or Rakhlin, Sridharan & 
Tewari |66| . A completely different link can also be formulated between regret and 
approachability, as discovered recently by Abernathy, Bartlett &; Hazan [1]. We recall 
that Blackwell's strategy consists in playing, at stage n + 1, optimally in the zero-sum 
projected game {g{x, y) — 7rc(g^),^„ — 7rc(^„)). Abernathy, Bartlett &: Hazan jl] proposed 
to use a regret minimization scheme to determine, stage by stage, in which projected 
game to play (i.e., not necessarily along the direction — 7rc(^„)). 

The formulation is rather simple when C = {0} C M!^, so we will focus only on this 
case. It can however be generalized to any convex cone and therefore to any convex set 
in R*^ (seen as a section of a convex cone in R'^"''^). The basic idea is to notice that, for 
C = {0} and every n G INf, 



Assume that at stage m, the player played optimality in the projected game along the 
direction 9m~i- Since C is approachable, this zero-sum game has a negative value, hence 
(^m-i) lE[5m]) < 0. As a consequence. 



The term inside the expectation can be written as the external regret if player and Na- 
ture's action set are respectively i?(0, 1) and ^g{a, b); (a, b) £ Ax b|. As a consequence, 
an approachability strategy can indeed be described as a two step procedure. At any 
stage n, choose, in a first step, a direction 9n G -6(0, 1) following any regret minimization 
algorithm. Then, in a second step, play optimally in the projected zero-sum game on 9n- 

Blackwell's strategy dictates to choose (in the first step) the direction 9n that maxi- 
mizes {9,g^) ; in other words, this is precisely the follow the leader algorithm that does 
not guarantee a shrinking regret (in full generality). The key point to understand this 
feature is that, by definition of the second step, {9m,'^[9m+i]) is always non-positive 
(no-matter the choice of 9m) ; so, in this auxiliary game. Nature is in fact very restricted 
on her choice of actions and what is even more intricate, these restrictions depend on 
the player's move. 



dcidn) = \\9n\\ = sup 




E dc(5j =^\\9n\\ <E sup {9,g,,)--}_^{9., 




m— 1 ) 9m 
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5 Appendix 



5.1 Game Theory lemma 

The following Lemma generalizes Ky Fan's inequality |19| recalled below: 

Let /C be a convex compact set of some Euclidian space and (7 : /C x /C — )• R such that, 
for every y £ IC, g{-,y) is concave over /C and for every x £ IC, g{x, ■) is continuous over 
K. If g{x, x) = for every x £ IC, then there exists xq £ IC such that sup^gy,;; g{x, xq) < 0. 



Lemma 5.1 Let g be a mapping on some compact and convex set X C R'^ such that 
g{x,x) = for every x £ the interior of X (such a mapping is called anti-symmetric). 

If for every x £ X , g{-,x) is concave and g{x, ■) is continuous and uniformly bounded 
by some M > on Xq, then 

inf sup g{x' , x) < 0. 

Proof: Without loss of generality, we assume that belongs to Xq and we denote, for 
every e > small enough, the convex compact set X^ := {(1 — e)x; x £ X^. Then 
g and X^ satisfy assumptions of Ky Fan's inequality. Thus, there exists Xe such that 
g(x, Xe) < for every x £ X^. 

Given x £ X, we denote by x_ the point on the boundary of X on the opposite 
direction of x, i.e., such that = ~]|f|[- ^1^° define \\X^\\ = inixex \\x\\- 

Since g{-,Xe) is concave, for every x in X that is not in X^, one has 

g{x,Xe) -ff((l -£)x,Xe) ^ g((l - £)x, Xg) -ff(x-,Xg) 

e||x|| (1 — e)||x|| + ||x_ II ' 

therefore, since (1 — e)x £ X^ and g{-,Xe) < on X^, one has 

, . e\\x\\ , , M\\X\\ 

g{x,Xs) < -J- ^ rrg{X-,Xe) < E— . 

' (1 - e)||x|| + ||x_|| (2-e)||Af_|| 
Hence the result, since the right hand term goes to as e decreases to 0. I 



5.1.1 Uniform concentration inequalities 

The following lemmas are central in different proofs. We recall that a process Zi £ K,'^ 
is a martingale difference sequence if E[Zt+i|Zi,...,Zt] = 0. Moreover, if ||Zt||2 < K 
then Hoeffding-Azuma's inequality in Euclidian spaces (see Corollary 3.5 in Kallenberg 
h, Sztencel |38| ) yields that, for every integer T > 1, 

JP{IFHI>4< (l + /^^)e-p(-^|2^') <2exp(-^|^e2), (14) 
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or, p{||^t|| > (t)~^{5)/y/T^ < 5 with (j){x) := {1 + x/K)exp {-x^/2K^). Actually, a 
weak maximal version of this inequality holds: 

pjat < T, > yej < (/)(Vre) or P jat < T , ||Zj|| > < 5. 

For d = 1, one can define (/){x) = 2exp (—2:^/2) and 4>{x) = 2exp (— otherwise. 
Stronger maximal inequalities for averages of martingale differences exist: 

Lemma 5.2 Let Zt be a martingale difference sequence with \\Zt\\ < K then, for every 
6 > and every integer T > 1, 

p{3«<r,||z,||>A,-.(^|)}<, 

Proof: Define et = 2cl)^^ {6t/4T) l\ft. Using a peeling argument, one obtains 

_ Llog2(T)J 2^+1-1 _ 

p{3t<r, iiz^ii >ei} < p{ U {ll^tll>^t}} 

m=l t=2^ 
Llog2(T)J 2™+i 



^ E U {ll^tll >e2'"+4} 

m=l t=2'" 
Llog2(T)J 2™+! _ 

< ^ P{ U {tllZill >2-e2"^+i}} 

m=l i=2'" 

m=l t=2'" ^ ^ 



Llog(T')J 2m+l J 2l°S2(T')+2 r 

< > < <b. 

m=l 



Hence the result. 



Similarly, maximal inequalities can be derived for tail events: 

Lemma 5.3 Let Zt G Mf^ be a martingale difference sequence with \\Zt\\ < K then, for 
every e > and every integer T >1, 



Pja t > T, \\Zt\\ > e} < 4exp 



The exponential dependency in T can be reduced since one has, as soon as > 1, 
f{3,>T+1. l|z,||>.}<f!L;g + ^]exp(-||l). 



2i?2 
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Proof: Again, using a peeling argument, one obtains 

+00 

P{3t>r, ||Zi||>4<^p{ U {\\Zt\\>e}} 



+00 2'"+ It 



171=0 t=2'^T 

+00 



<^P| y {t\\Zt\\ > 2"^re}} < </> (^V2"*+iT-) < / 0(^72^ 



log(2) 



m=0 



00 



log(2) 



T£2 



hence 



3t>T, \\Zt\\ > e 



^}-log(2) (i 



+ 



V2 



T£2 



exp 



III] 

8K^ 



and the first part of the result follows. 

The second part of the proof follows from the facts that 

+00 

P{3t>r + 1, \\Zt\\>e}< p|||Zt||>e} 



t=T+l 
+00 



< 



t=T+l 



1 t 



< 



< 



2^2 roo 



¥2 



exp 



xe 



2K2 

2 



2i^2 



dx 



K2 



.(n + u^) exp{—u^/2)du, 



and 



00 2 



(n + n )e 2 dii = (1 + x)e ^ + j e 2(iu<(l + xH — je 2 



5.2 Probability lemmas 

Lemma 5.4 Lei /„, G L2{^, fJ,,T) such that YlneK ll/nlP/'^ < °° '^'^^ ll/n+i — /» 
i/ien /„ converges to 0, fi-as. 

Proof: First, we prove a weaker version when ||/n|| < for every n G UNT. 
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Let Mn = [n^/^] and fc„ be the integer minimizing \\fk\\ over [M„ + 1, M„+i]. Then, 

II J, ||2 , ,r Mn + l 



II ||2 < Ek=M^+l 

I I n^n I I 71 /T 

Mn+1 — 
M„+i 

< E 

fc=M„+l 



ll/fcl 



< 



Mn+l 



E 



jvin + l II p II 2 + 1 ^ 

1.1/6 — 1,6/5 ■ 



< 



fc=M„+l 



fcV6 



fc=A/„+l 



Therefore, X^^gjvj ||/fc,J| < oo and Fatou's lemma ensures that fk^ converges to 0, fi-as. 
Let us define hk = fk — fk„, then for evert k > kn (and similarly for k < kn) 



3=kn + l 



< < 



Mr, 



Mr, 



M, 



n.+l 



Mr, 



Mr, 



< oo smce 



\\hk\\ 

Summing over k, one gets 
fceM new 

So both /ifc and fk„ converge fx-as to and thus, so is fk = hk + fk„- 

In the general case, one just need to notice that there exists an increasing sequence 
/3„ > 1 such that J2n&JN f^nWfnW'^ < OO and to define M„+i 
proof follows as before, since Mn+i — Mn ■ - -g- 



n 



and thus M„+i/(M„+i - M„) ~ /3. 



M„ + 1. The 



Lemma 5.5 Lei C be a product set in L2{i^, fJ-, J-) and assume that for every n G IN 
i) Qn ^ L2{^, fJ',J~) is bounded by B & L2{^, fJ-, J-) 
a) Xn £ 1^2(0,, fj,,J^) takes value in {0, 1} 

i^^) 9X,n = YJk=l '^m. Qm/Sn where Sn = YZi=l '^m 

iv) (Xn+i{gxn - ^c{9xn)^ ^""^^ g '^^^'^'"'^ \ < for a sequence of non-negative 

\ '^n+l / 

En such that XlnelN^" ^ 

Then g^^ „ converges to C, fioo-as, where /ioo(^) = H {lim5„ = 00}). 
Proof: Let /n = n ~ ^c{gn) then point iv) implies that: 



ll/n+lf < ll/nf 



2 ( Xn+l -J"' ,fn ) + 
Xn+1 



Xn 



+1 
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Since both gx,n+i and n are bounded by B and Xn G {0, 1}, one has: 

2 Un+l^, fn) < ll/nf " ||/n+lf + 4 / B\c.)d^^i,J) + S. 

Notice that, for every uj, J2nem '^n+i{(^) / S^+iiuj) < I]„>i l/n^ = vr^/G. For every 
n G IN and a; E O, let j(n,uj) := inf {m G ]N,5m('^) = n} be the first time such that 
5m(<^) is bigger than n (if it exists, otherwise it is oo). Define fn{'^) ■= fj{n,u}){^)j with 
/oo(a;) = 0, so that 

new new ^ "^-^ ' new 

Since C is a product set, projection on C is a coordinate-wise projection, thus 



< 2 



9jn + l.^i^) - 9j„,^{^) 



n+1 



< 



4B{uj) 
n + 1 



and so \\fn+i - /nf = lG\\B\\y{n + 1)^. 

Using Lemma 15.41 with /3„ an increasing sequence such that ^ /3n||/|P/'T' converges 



and Mn- 



+1 



, we obtain the fi-a.s. convergence of fn- As a consequence, after 



restriction to the event {hm5„ = oo}, fn converges /x-as to zero. 



Being a product set is used to bound 
>C2-boundedness, as defined by Lehrer 



n+1 



fn 



; convexity of C (nor actually its 



) is not enough for this proof. The reason 
is that if we define the mapping gn G L2{^, H,J-) by gn{uj) = gj(n,Lj)i^) and let N := 
j{n,uj). Then, without the product property which induces a coordinate-wise projection. 
He Qn) {^) has no reason to be equal to Hq (gN) {^)- 

Corollary 5.6 Same results hold if Xn G [0, 1] does not necessarily take values in {0, 1}. 

Proof: Assume that go = and Xq = Sq = 1, then Xlnio l^n uniformly bounded. 
Indeed, define kn = min{m s.t. Sm > n}, so that 



E 

n=0 



,2 
n 

Si 



i = E E §?< 

n=l m=kn 



n=l 



m=kn 



n=l 



m=kn 



n=l 



Therefore, on {lim5.„ oo}, 5fj^,„ = I]m=o '^^ S'm/^m converges to C, and g^^n - gx,n 
converges to zero. 
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